Wednesday, April 27, 2011

Find depth of a k-ary tree given a parent array : Solution 2

The problem statement is mentioned here. The first solution is here.

We can reduce some processing by observation that we only need to find the depth of leaf nodes. Any node whose index appears in the parent array $P$ is an internal node and do not need to be processed. The modified solution is as below:
Let L be an array of length N
Initialize all elements of L to true
for each node i from 0 to N-1
   if P[i] is not -1
       set L[P[i]] to false
   end if
end for

initialize maxDepth to 0
for each node i from 0 to N-1
    if L[i] is false, continue next node
    set parent to P[i]
    set depth to 1
    while parent is not -1
       set parent to P[parent]
       increment depth
    end while
    if maxDepth < depth
       set maxDepth = depth
    end if
end for
As can be seen, the time complexity of the algorithm is $O(N^2)$ where $N$ is the number of nodes. Thus the worst case is still not improved. The space complexity is $O(N)$.

Monday, April 25, 2011

Railway Quota Allotment Problem

I describe the following problem and in future posts will try to find optimal solutions to the problem. The problem statement is as follows:

The Indian Railway maintains a quota based system where each station is given some number of seats. These seats are located in different kinds of classes, say sleeper, 3-tier AC and 2-tier AC. Every train has different number of compartments and so the length of the trains differ. Every stations has one entry point where passengers come on the platform and they walk to the compartment where quota is allotted for that particular station and then take the seat. Our aim is to minimize the distance passengers have to walk.

To simplify the problem we will assume following:
  1. There is only one class of compartment in each train. So all the passengers travel to that compartment only.
  2. Length of each train is same, i.e. $l$.
  3. There are $N$ stations with entry points $e_i, 1 \le i \le N $ such that $ 0 \le e_i \le l $. You can see that $e_i$ is expressed as a distance from the start of the train.
  4. Every station occupies $x_i$ length of the train such that $ 0 < x_i <= l, 1 <= i <= N$. It is easy to see that $ \displaystyle\sum\limits_{i=1}^N x_i = l $. If station $i$ is allotted quota at distance $d_i$ from start of the train, the next quota allotment can happen only at distance $d_i + x_i$.
  5. The quota allotted to each station is at $q_i$ distance from the start of the train such that $0 <= q_i < l, 1 <= i <= N$. The problem is to find $q_i$ for each station such as to optimize $ \displaystyle\sum\limits_{i=1}^{N} | e_i - q_i | $. See that not all the passengers at station $s_i$ has to travel till $q_i$ but to simplify the things, we assume so.
The image on the left shows a sample problem for two stations and a probable solution. The total distance travelled by passengers on both the stations is $e_1 + e_2 - x_1$.

There are several methods that I will try to focus on including greedy method and dynamic programming since the problem is to find an optimal solution. I will focus on finding whether the solution has an optimal sub-structure. The number of different possible solutions depend on the permutations of stations and so is of the order of $N!$. The problem becomes more complicated by usage of absolute value of $ e_i - q_i $. I haven't seen this kind of problem in literature so I don't think this will be solved easily by me.


Friday, April 1, 2011

Code Complete by Steve McConnell : Chapter 10

The author discusses points in using variables in this chapter:

  • Limit the scope of a variable as much as possible.
  • Keep references of a variable together
  • Initialize a variable as close to its use
  • Don't assign value to a variable until just before the value is used
The author advises to check that the programmer is not using a variable whose life span is already over and the value is stale. The author recommends binding of value to a variable as late as possible. The author provides example of binding a variable's value at compile time and run time.

The author then discusses a British computer scientist Michael Jackson's structured programming method which develops a general relationship between data structures and control structures.
The author suggests to use each temporary variable for only one purpose and avoid names like temp for a variable. The author suggests avoiding use of same variable to store values and using special values for error.
There shouldn't be unreferenced/unused variables in the code.
The author then provides list of problems in using global variables:
  • Use of global variables is against principle of information hiding.
  • Use of global variables increases the risk of side-effects.
  • Global variables increases the risk of aliasing, i.e. referring to the same variable with two different names.
  • Re-entrant code will not work correctly with global variables.
  • Global data limits the scope of code reuse.
  • Global data damages modularity and intellectual manageability.
The author then provides some valid reasons for usage of global data.
The author then provides some guidelines to reduce the risk of using global data:
  • Use naming convention to differentiate global variables.
  • Use locking to control access to global variables.
The author then provides a list of benefits to use access routines for global data:
  • The implementation of global data remains hidden from the rest of the program and can be changed when required without impacting the rest of the program.
The author then provides some rules on how to use access routines for global data.

Code Complete by Steve McConnell : Chapter 9

In this chapter, the author focuses on how to choose good names for data. The author suggests to have a readable, memorable and appropriate name for a variable. The author provides following guidelines for choosing a variable name:
  1. The name of the variable should fully and accurately describe the entity the variable represents. The name would be long and easy to decipher. The author then provides some examples of good and bad variable names.
  2. Variables with loop scope can be given short names and variables with global scope should be given long names.
  3. Variables that contain computed values should have their names modified with a suffix representing the computed value, i.e. variables holding total, maximum, average should have total, max and avg as the suffix in the name. This convention makes sure that the significant part of the name comes first and duplicate names with the qualifier at the front and at the end, e.g. countEmployee and employeeCount are avoided.
  4. Make use of precise opposites in names.
The author then provides guidelines for naming data in several common usage:
  1. Loop Indexes:
    • If the variable scope is within a loop, short names like i, j and k can be used.
    • If the loop is longer or there are nested loops, it is better to give proper variable names to avoid index cross-talk.
  2. Status Variables: Status variables should never have flag in its name. Flags should be assigned values and their values should be tested with enumerated types, named constants and global variables that act as named constants.
  3. Temporary Variables: Temporary variables are sign that the programmer does not yet fully understand the problem. Programmer would treat temporary variables casually increasing the chances of error.
  4. Boolean Variables: Use typical boolean names like Done, Error, Found and Success. Avoid negative names like NotFound or NotDone.
  5. Enumerated Types: In naming members of an enumerated type, use group prefix or suffix.
  6. Named Constants: The name should represent the abstract entity rather than the constant value it refers to.
The author then provides some points on why, when and how to create naming convention.
The naming conventions provide several benefits:
  • No local decisions are required to be taken allowing to concentrate on more important characteristics of the code.
  • Reading source code across projects become easier.
  • They help in learning code more quickly.
  • They reduce name proliferation.
  • They compensate for language weaknesses.
  • They emphasize relationships among related items.
The author enlists following reasons to have naming conventions:
  • When multiple programmers work on a project
  • When one programmer hands over a program to another for modification and maintenance
  • When program written by one programmer are reviewed by another
  • When the program is large in size
  • When standard terms and abbreviations are required in coding
The author then provides some naming guidelines for language independent features:
  • Global variables should be easily identifiable with g_ prefix.
  • Module variables should be easily identifiable with m_ prefix.
  • Use a prefix or suffix to identify type definitions.
  • Named constants need to be distinguishable from variables.
  • Enumerated types should be distinguishable from named constants, variables and functions.
  • Input only parameters should be identifiable in languages that don't enforce them.
  • Names should be formatted with separators like _ to enhance readability.
The author then provides naming guidelines for languages like C, Pascal and Fortran.
The author then discusses Hungarian Naming Convention for C programming language and shows its advantages and disadvantages.
The author advises not to use short names but still provides some preferred ways to shorten names if need arises. The author recommends not to use phonetic abbreviations for variable names.
The author then provides some rules to avoid traps in creating abbreviations:
  • Don't abbreviate by removing just one character.
  • Abbreviate consistently across the whole program with the same abbreviation
  • Create names that can be pronounced
  • Always document short names with translation tables
The author then provides some names to avoid:
  • Misleading names and abbreviations
  • Names with similar meanings
  • Similar Names with different meanings
  • Names that sound similar phonetically
  • Numerals in names
  • Misspelled words
  • Words that are commonly misspelled
  • Names with difference in just capitalization
  • Names of standard library routines and language keywords
  • Names with hard to read character (e.g. l and I (el and eye), 0 and O (zero and ooh), etc... )

Saturday, March 19, 2011

Code Complete by Steve McConnell : Chapter 8

The author recommends defining major data structures during requirements analysis and architecture phase.

The author then provides a data literacy test to identify what expertise the reader posses in understanding data structures. He also recommends reading some books if the reader is not an expert in data structures.

The author then enlists reasons to create own data types:
  • Make modifications easier in future
  • Avoid excessive information distribution
  • Increase Reliability
  • Compensate for language weaknesses
The author then provides guidelines for creating own data types:
  • Create types with problem domain names
  • Do not use predefined types (like int, float in C++) directly in the code except in type definitions
  • Never redefine predefined types since it can create confusion in the mind of reader
  • Define substitute types for portability
  • Create types using other created types
The author then provides some guidelines for variable declarations:
  • Use a template for variable declaration
  • Turn of implicit declarations
  • Use naming conventions
The author emphasizes the importance of data initialization. Improper initialization can result from several situations:
  • A variable is never initialized
  • A variable's value is outdated
  • Only a part of the variable is assigned a value
The author then provides some guidelines for data initializations:
  • Check input parameters for validity
  • Initialize each variable close to where it is used
  • Check the need for reinitialization
  • Initialize named constants once, initialize variables with executable code
  • Initialize each variable as it's declared
  • Use compiler setting to initialize all variables
  • Use compiler warnings for uninitialized variables
  • Use a memory access checker to check for bad pointers
  • Initialize working memory at the beginning of the program

Friday, March 18, 2011

Code Complete by Steve McConnell : Chapter 7

The author defines software design as a scheme of turning a specification of a computer program into an operational program. Design links requirements specification to coding and debugging. The author points out that design is a heuristic process and requires creativity and insight.

The author tries to distinguish between designs done in large and small projects. In large projects, design might be an activity distinct from requirements and coding and carried out by different set of people. The design may be divided in several stages like software architecture, high-level module design and detailed implementation design. Design work continues even after the design phase is over.

On small projects, design might involve just writing the PDL before coding it in the programming language.

The author then discusses design at different levels:
  • Division into subsystems: This involves identification of major components in the software and interfaces/interaction between them.
  • Division into modules: If a subsystem is too big to translate to module, this level ensures that the subsystem has been decomposed to a level of detail fine enough to implement it as a set of modules.
  • Division into routines: This involves dividing a module into service routines and specifying syntactic details of service routines.
  • Internal routine design: This requires writing PDL, investigating algorithms, deciding on organization of code and then coding in programming language.
The author then discusses two approaches of structured design:
  • Top Down Decomposition: It is also called top-down design or stepwise refinement. It involves moving from a general statement of what the program does to detailed statements about specific tasks that are performed. Following steps should be followed in top down decomposition:
    • Design top levels first before moving to lower levels
    • Avoid language specific details
    • Postpone details to lower level design
    • Formalize and Verify each level
    • Move to next level for next set of refinement
The author compares the technique with divide and conquer. The author advises to continue decomposition until it seems it is easier to code then decompose.
  • Bottom Up Composition: Following steps need to be followed in bottom up decomposition:
    • Identify low level capabilities from the description of the problem
    • Identify common aspects of low level capabilities and group them
    • Continue with the next level up
The author enlists following strengths of top-down approach:
  1. It is easy
  2. Implementation details can be deferred
Following weaknesses are also mentioned:
  1. Systems not hierarchical in nature are difficult to design using this approach
  2. This approach requires single function at the top which is different from modern event driven systems
  3. Top level function of a system is hard to identify
The author provides following strengths of bottom-up approach:
  1. It identifies utility routines early which results in compact design
  2. It results in a lot of reuse
The following weaknesses of the bottom-up approach are mentioned:
  1. This approach cannot be used exclusively. It needs to be combined with top-down approach.
  2. Sometimes it becomes difficult to assemble pieces to build the system.
The author then discusses object-oriented design. The author defines the object-oriented design process as identification of real-world and abstract objects and representation of this objects by programming language objects. The author clarifies that object-oriented design works at a higher level of abstraction than structured design and the building of design is based on the data which makes it more stable approach.

The author then discusses following ideas related to object-oriented programming:
  1. Abstraction: Object oriented design allows to ignore irrelevant details.
    The unit of abstraction in structured design is a function while in object-oriented design; it is an object. Since object has both functions and data, object-oriented design provides bigger intellectual building blocks than structured design.
  2. Encapsulation: It involves principles of information hiding.
  3. Modularity: It is nothing but grouping of related services and data into modules that are highly cohesive and loosely coupled.
  4. Hierarchy and Inheritance: This involves identifying similarities and differences between different objects.
  5. Objects and classes
The author provides following steps for object-oriented design:
  • Identify objects and their attributes
  • Determine what can be done to each object
  • Determine what each object can do to other objects
  • Determine parts of each object that will be visible
  • Define each object's public interface
The author classifies objects in four main categories:
  1. Problem Domain Component
  2. User Interface Component
  3. Task Management Component
  4. Data Management Component
The author points out that without some object in each of these categories makes an incomplete design.

The author then compares following popular design approaches:
  1. Structured Design: It works well for systems with several hundred lines of code.
  2. Information Hiding: This approach should be used whenever possible.
  3. Object Oriented Design: It works well for systems with more than hundred thousand to over a million lines of code.
The author suggests not to restrict to using only one approach. The author points out following three traits of the design process:
  • Design is a sloppy process
  • Design is a wicked problem
  • Design is a heuristic process: The author provides reference to G Polya's book, How to Solve It and summarizes the approach that he suggested in his book.
When the design gets complete, it should have following characteristics:
  1. Intellectual Manageability
  2. Low Complexity
  3. Ease of Maintenance
  4. Minimal Connectedness
  5. Extensibility
  6. Reusability
  7. High fan-in
  8. Low to medium fan-out
  9. Portability
  10. Leannes
  11. Stratified Design
  12. Standard Techniques

Wednesday, March 2, 2011

Find depth of a k-ary tree given a parent array : Solution 1

The problem statement is mentioned here.

Following is the first solution.
initialize maxDepth to 0
for each node i from 0 to N-1
    set parent to P[i]
    set depth to 1
    while parent is not -1
       set parent to P[parent]
       increment depth
    end while
    if maxDepth < depth
       set maxDepth = depth
    end if
end for

As can be seen, the time complexity of the algorithm is $O(N^2)$ where $N$ is the number of nodes. The space complexity is $O(1)$.