C++ Next: 04/01/2011

Wednesday, April 27, 2011

Find depth of a k-ary tree given a parent array : Solution 2

The problem statement is mentioned here. The first solution is here.

We can reduce some processing by observation that we only need to find the depth of leaf nodes. Any node whose index appears in the parent array $P$ is an internal node and do not need to be processed. The modified solution is as below:

Let L be an array of length N
Initialize all elements of L to true
for each node i from 0 to N-1
   if P[i] is not -1
       set L[P[i]] to false
   end if
end for
initialize maxDepth to 0
for each node i from 0 to N-1
    if L[i] is false, continue next node
    set parent to P[i]
    set depth to 1
    while parent is not -1
       set parent to P[parent]
   increment depth
    end while
    if maxDepth < depth
       set maxDepth = depth
    end if
end for

As can be seen, the time complexity of the algorithm is $O(N^2)$ where $N$ is the number of nodes. Thus the worst case is still not improved. The space complexity is $O(N)$.

Monday, April 25, 2011

Railway Quota Allotment Problem

I describe the following problem and in future posts will try to find optimal solutions to the problem. The problem statement is as follows:

The Indian Railway maintains a quota based system where each station is given some number of seats. These seats are located in different kinds of classes, say sleeper, 3-tier AC and 2-tier AC. Every train has different number of compartments and so the length of the trains differ. Every stations has one entry point where passengers come on the platform and they walk to the compartment where quota is allotted for that particular station and then take the seat. Our aim is to minimize the distance passengers have to walk.

To simplify the problem we will assume following:

There is only one class of compartment in each train. So all the passengers travel to that compartment only.
Length of each train is same, i.e. $l$.
There are $N$ stations with entry points $e_i, 1 \le i \le N $ such that $ 0 \le e_i \le l $. You can see that $e_i$ is expressed as a distance from the start of the train.
Every station occupies $x_i$ length of the train such that $ 0 < x_i <= l, 1 <= i <= N$. It is easy to see that $ \displaystyle\sum\limits_{i=1}^N x_i = l $. If station $i$ is allotted quota at distance $d_i$ from start of the train, the next quota allotment can happen only at distance $d_i + x_i$.
The quota allotted to each station is at $q_i$ distance from the start of the train such that $0 <= q_i < l, 1 <= i <= N$. The problem is to find $q_i$ for each station such as to optimize $ \displaystyle\sum\limits_{i=1}^{N} | e_i - q_i | $. See that not all the passengers at station $s_i$ has to travel till $q_i$ but to simplify the things, we assume so.

The image on the left shows a sample problem for two stations and a probable solution. The total distance travelled by passengers on both the stations is $e_1 + e_2 - x_1$.

There are several methods that I will try to focus on including greedy method and dynamic programming since the problem is to find an optimal solution. I will focus on finding whether the solution has an optimal sub-structure. The number of different possible solutions depend on the permutations of stations and so is of the order of $N!$. The problem becomes more complicated by usage of absolute value of $ e_i - q_i $. I haven't seen this kind of problem in literature so I don't think this will be solved easily by me.

Friday, April 1, 2011

Code Complete by Steve McConnell : Chapter 10

The author discusses points in using variables in this chapter:

Limit the scope of a variable as much as possible.
Keep references of a variable together
Initialize a variable as close to its use
Don't assign value to a variable until just before the value is used

The author advises to check that the programmer is not using a variable whose life span is already over and the value is stale. The author recommends binding of value to a variable as late as possible. The author provides example of binding a variable's value at compile time and run time.

The author then discusses a British computer scientist Michael Jackson's structured programming method which develops a general relationship between data structures and control structures.
The author suggests to use each temporary variable for only one purpose and avoid names like temp for a variable. The author suggests avoiding use of same variable to store values and using special values for error.
There shouldn't be unreferenced/unused variables in the code.
The author then provides list of problems in using global variables:

Use of global variables is against principle of information hiding.
Use of global variables increases the risk of side-effects.
Global variables increases the risk of aliasing, i.e. referring to the same variable with two different names.
Re-entrant code will not work correctly with global variables.
Global data limits the scope of code reuse.
Global data damages modularity and intellectual manageability.

The author then provides some valid reasons for usage of global data.
The author then provides some guidelines to reduce the risk of using global data:

Use naming convention to differentiate global variables.
Use locking to control access to global variables.

The author then provides a list of benefits to use access routines for global data:

The implementation of global data remains hidden from the rest of the program and can be changed when required without impacting the rest of the program.

The author then provides some rules on how to use access routines for global data.

Code Complete by Steve McConnell : Chapter 9

In this chapter, the author focuses on how to choose good names for data. The author suggests to have a readable, memorable and appropriate name for a variable. The author provides following guidelines for choosing a variable name:

The name of the variable should fully and accurately describe the entity the variable represents. The name would be long and easy to decipher. The author then provides some examples of good and bad variable names.
Variables with loop scope can be given short names and variables with global scope should be given long names.
Variables that contain computed values should have their names modified with a suffix representing the computed value, i.e. variables holding total, maximum, average should have total, max and avg as the suffix in the name. This convention makes sure that the significant part of the name comes first and duplicate names with the qualifier at the front and at the end, e.g. countEmployee and employeeCount are avoided.
Make use of precise opposites in names.

The author then provides guidelines for naming data in several common usage:

Loop Indexes:

If the variable scope is within a loop, short names like i, j and k can be used.
If the loop is longer or there are nested loops, it is better to give proper variable names to avoid index cross-talk.

Status Variables: Status variables should never have flag in its name. Flags should be assigned values and their values should be tested with enumerated types, named constants and global variables that act as named constants.
Temporary Variables: Temporary variables are sign that the programmer does not yet fully understand the problem. Programmer would treat temporary variables casually increasing the chances of error.
Boolean Variables: Use typical boolean names like Done, Error, Found and Success. Avoid negative names like NotFound or NotDone.
Enumerated Types: In naming members of an enumerated type, use group prefix or suffix.
Named Constants: The name should represent the abstract entity rather than the constant value it refers to.

The author then provides some points on why, when and how to create naming convention.
The naming conventions provide several benefits:

No local decisions are required to be taken allowing to concentrate on more important characteristics of the code.
Reading source code across projects become easier.
They help in learning code more quickly.
They reduce name proliferation.
They compensate for language weaknesses.
They emphasize relationships among related items.

The author enlists following reasons to have naming conventions:

When multiple programmers work on a project
When one programmer hands over a program to another for modification and maintenance
When program written by one programmer are reviewed by another
When the program is large in size
When standard terms and abbreviations are required in coding

The author then provides some naming guidelines for language independent features:

Global variables should be easily identifiable with g_ prefix.
Module variables should be easily identifiable with m_ prefix.
Use a prefix or suffix to identify type definitions.
Named constants need to be distinguishable from variables.
Enumerated types should be distinguishable from named constants, variables and functions.
Input only parameters should be identifiable in languages that don't enforce them.
Names should be formatted with separators like _ to enhance readability.

The author then provides naming guidelines for languages like C, Pascal and Fortran.
The author then discusses Hungarian Naming Convention for C programming language and shows its advantages and disadvantages.
The author advises not to use short names but still provides some preferred ways to shorten names if need arises. The author recommends not to use phonetic abbreviations for variable names.
The author then provides some rules to avoid traps in creating abbreviations:

Don't abbreviate by removing just one character.
Abbreviate consistently across the whole program with the same abbreviation
Create names that can be pronounced
Always document short names with translation tables

The author then provides some names to avoid:

Misleading names and abbreviations
Names with similar meanings
Similar Names with different meanings
Names that sound similar phonetically
Numerals in names
Misspelled words
Words that are commonly misspelled
Names with difference in just capitalization
Names of standard library routines and language keywords
Names with hard to read character (e.g. l and I (el and eye), 0 and O (zero and ooh), etc... )