C++ Next

Saturday, May 14, 2011

Code Complete by Steve McConnell : Chapter 12

The author discusses tips to use user-defined data types. The authors suggest to use structured data to simplify operations on blocks of data and clarify data relationships. Structured data simplify parameter passing and reduce maintenance.

The authors then discuss the benefits of using table look-ups instead of complicated conditional statements. The authors discuss three techniques to look up an entry in the table:

Direct Access - The author discusses two ways, directly using data as a key to table and the other is fudging the table-lookup keys.
Indexed Access - This works similar to handling of virtual memory address to physical memory address in operating systems.
Stair-step Access - The entries in a table are valid for ranges of data. The authors also provide some suggestions on using the technique effectively.

The authors then discuss the advantages of using abstract data types (ADTs) such as:

Hide implementation details
It's easier to improve performance
Changes are localized
The program becomes self-documenting

The authors then give some examples of ADTs for real world entities and also some guidelines of designing ADTs.

Code Complete by Steve McConnell : Chapter 11

This chapter provides some tips to use primitive data types.

Avoid using number literals directly. Use named constants instead. This makes code easily modifiable and readable.
Anticipate divide by zero errors.
Use type conversions explicitly.
Avoid mixed type comparisons.
Never ignore compiler warnings.
Take proper care for integer divisions.
Check for integer overflow.
Check for overflow in intermediate results.
Avoid additions and subtractions of floating point numbers with a very different magnitude.
Never do equality comparison between floating point numbers.
Anticipate rounding errors in floating point numbers.
Avoid magic characters and strings
Avoid off-by-one errors
Initialize strings to null and use strncpy instead of strcpy to avoid endless strings
Use arrays of characters instead of pointers
Use boolean variables to document conditions
Enumerated types instead of integer or string literals increase readability
Enumerated types in some languages are type-checked which increases reliability
Enumerated types increase modifiability/maintainability
Have the first enumerated value as invalid
Check that array indexes are within the bounds
Use correct subscripts for multi-dimensional arrays
Isolate pointer operations in routines
Check pointers and variables referenced by the pointers before using it
Set pointers to NULL after freeing them

Tuesday, May 10, 2011

Reversi (Othello)

I had developed the game of Reversi (Othello) during my BTech from 1999 till 2002. The earlier versions of the game was in DOS where user was supposed to enter row/column to make a move. I changed it to cursor based implementation and then migrated the whole code to Windows. I also added Windows Help which included history, rules and information about the algorithms I had used in deciding the best move for the computer. There were three different algorithms used:

The move which captures the highest number of opponent's balls - this was very preliminary
Each cell was given a priority with corner cells given higher priority than the middle ones and the next move was decided based on this priority.
A minimax search tree based approach where I was creating a tree of moves by both the players and was trying to get the best move that results in a win ultimately.

During my BTech Thesis in final year, I extended the game such that the games played would be stored in a knowledge base and in deciding the next move, the computer will refer the knowledge base to see if the move has resulted in losing the game in the past. This reduced the points for a particular move.

Recently I thought of learning programming in Android and downloaded the Android SDK. What is the best way to learn other than to implement Reversi on Android? But I am late. I found there are lots of Reversi games available here. The best I got more information is from Aart Bik. Aart is a Software Engineer at Google with a nice blog and a website. He has developed chess as well as checkers too.

His website gives details about perft, a method to verify the correctness of the move generation engine and different board representations. It also talks about protocols UCI, XBoard and Winboard that completely separates game engine from the user interface. During my own development of Reversi, I had never thought about doing all this standardization. Lucky me to know it today!

Wednesday, April 27, 2011

Find depth of a k-ary tree given a parent array : Solution 2

The problem statement is mentioned here. The first solution is here.

We can reduce some processing by observation that we only need to find the depth of leaf nodes. Any node whose index appears in the parent array $P$ is an internal node and do not need to be processed. The modified solution is as below:

Let L be an array of length N
Initialize all elements of L to true
for each node i from 0 to N-1
   if P[i] is not -1
       set L[P[i]] to false
   end if
end for
initialize maxDepth to 0
for each node i from 0 to N-1
    if L[i] is false, continue next node
    set parent to P[i]
    set depth to 1
    while parent is not -1
       set parent to P[parent]
   increment depth
    end while
    if maxDepth < depth
       set maxDepth = depth
    end if
end for

As can be seen, the time complexity of the algorithm is $O(N^2)$ where $N$ is the number of nodes. Thus the worst case is still not improved. The space complexity is $O(N)$.

Monday, April 25, 2011

Railway Quota Allotment Problem

I describe the following problem and in future posts will try to find optimal solutions to the problem. The problem statement is as follows:

The Indian Railway maintains a quota based system where each station is given some number of seats. These seats are located in different kinds of classes, say sleeper, 3-tier AC and 2-tier AC. Every train has different number of compartments and so the length of the trains differ. Every stations has one entry point where passengers come on the platform and they walk to the compartment where quota is allotted for that particular station and then take the seat. Our aim is to minimize the distance passengers have to walk.

To simplify the problem we will assume following:

There is only one class of compartment in each train. So all the passengers travel to that compartment only.
Length of each train is same, i.e. $l$.
There are $N$ stations with entry points $e_i, 1 \le i \le N $ such that $ 0 \le e_i \le l $. You can see that $e_i$ is expressed as a distance from the start of the train.
Every station occupies $x_i$ length of the train such that $ 0 < x_i <= l, 1 <= i <= N$. It is easy to see that $ \displaystyle\sum\limits_{i=1}^N x_i = l $. If station $i$ is allotted quota at distance $d_i$ from start of the train, the next quota allotment can happen only at distance $d_i + x_i$.
The quota allotted to each station is at $q_i$ distance from the start of the train such that $0 <= q_i < l, 1 <= i <= N$. The problem is to find $q_i$ for each station such as to optimize $ \displaystyle\sum\limits_{i=1}^{N} | e_i - q_i | $. See that not all the passengers at station $s_i$ has to travel till $q_i$ but to simplify the things, we assume so.

The image on the left shows a sample problem for two stations and a probable solution. The total distance travelled by passengers on both the stations is $e_1 + e_2 - x_1$.

There are several methods that I will try to focus on including greedy method and dynamic programming since the problem is to find an optimal solution. I will focus on finding whether the solution has an optimal sub-structure. The number of different possible solutions depend on the permutations of stations and so is of the order of $N!$. The problem becomes more complicated by usage of absolute value of $ e_i - q_i $. I haven't seen this kind of problem in literature so I don't think this will be solved easily by me.

Friday, April 1, 2011

Code Complete by Steve McConnell : Chapter 10

The author discusses points in using variables in this chapter:

Limit the scope of a variable as much as possible.
Keep references of a variable together
Initialize a variable as close to its use
Don't assign value to a variable until just before the value is used

The author advises to check that the programmer is not using a variable whose life span is already over and the value is stale. The author recommends binding of value to a variable as late as possible. The author provides example of binding a variable's value at compile time and run time.

The author then discusses a British computer scientist Michael Jackson's structured programming method which develops a general relationship between data structures and control structures.
The author suggests to use each temporary variable for only one purpose and avoid names like temp for a variable. The author suggests avoiding use of same variable to store values and using special values for error.
There shouldn't be unreferenced/unused variables in the code.
The author then provides list of problems in using global variables:

Use of global variables is against principle of information hiding.
Use of global variables increases the risk of side-effects.
Global variables increases the risk of aliasing, i.e. referring to the same variable with two different names.
Re-entrant code will not work correctly with global variables.
Global data limits the scope of code reuse.
Global data damages modularity and intellectual manageability.

The author then provides some valid reasons for usage of global data.
The author then provides some guidelines to reduce the risk of using global data:

Use naming convention to differentiate global variables.
Use locking to control access to global variables.

The author then provides a list of benefits to use access routines for global data:

The implementation of global data remains hidden from the rest of the program and can be changed when required without impacting the rest of the program.

The author then provides some rules on how to use access routines for global data.

Code Complete by Steve McConnell : Chapter 9

In this chapter, the author focuses on how to choose good names for data. The author suggests to have a readable, memorable and appropriate name for a variable. The author provides following guidelines for choosing a variable name:

The name of the variable should fully and accurately describe the entity the variable represents. The name would be long and easy to decipher. The author then provides some examples of good and bad variable names.
Variables with loop scope can be given short names and variables with global scope should be given long names.
Variables that contain computed values should have their names modified with a suffix representing the computed value, i.e. variables holding total, maximum, average should have total, max and avg as the suffix in the name. This convention makes sure that the significant part of the name comes first and duplicate names with the qualifier at the front and at the end, e.g. countEmployee and employeeCount are avoided.
Make use of precise opposites in names.

The author then provides guidelines for naming data in several common usage:

Loop Indexes:

If the variable scope is within a loop, short names like i, j and k can be used.
If the loop is longer or there are nested loops, it is better to give proper variable names to avoid index cross-talk.

Status Variables: Status variables should never have flag in its name. Flags should be assigned values and their values should be tested with enumerated types, named constants and global variables that act as named constants.
Temporary Variables: Temporary variables are sign that the programmer does not yet fully understand the problem. Programmer would treat temporary variables casually increasing the chances of error.
Boolean Variables: Use typical boolean names like Done, Error, Found and Success. Avoid negative names like NotFound or NotDone.
Enumerated Types: In naming members of an enumerated type, use group prefix or suffix.
Named Constants: The name should represent the abstract entity rather than the constant value it refers to.

The author then provides some points on why, when and how to create naming convention.
The naming conventions provide several benefits:

No local decisions are required to be taken allowing to concentrate on more important characteristics of the code.
Reading source code across projects become easier.
They help in learning code more quickly.
They reduce name proliferation.
They compensate for language weaknesses.
They emphasize relationships among related items.

The author enlists following reasons to have naming conventions:

When multiple programmers work on a project
When one programmer hands over a program to another for modification and maintenance
When program written by one programmer are reviewed by another
When the program is large in size
When standard terms and abbreviations are required in coding

The author then provides some naming guidelines for language independent features:

Global variables should be easily identifiable with g_ prefix.
Module variables should be easily identifiable with m_ prefix.
Use a prefix or suffix to identify type definitions.
Named constants need to be distinguishable from variables.
Enumerated types should be distinguishable from named constants, variables and functions.
Input only parameters should be identifiable in languages that don't enforce them.
Names should be formatted with separators like _ to enhance readability.

The author then provides naming guidelines for languages like C, Pascal and Fortran.
The author then discusses Hungarian Naming Convention for C programming language and shows its advantages and disadvantages.
The author advises not to use short names but still provides some preferred ways to shorten names if need arises. The author recommends not to use phonetic abbreviations for variable names.
The author then provides some rules to avoid traps in creating abbreviations:

Don't abbreviate by removing just one character.
Abbreviate consistently across the whole program with the same abbreviation
Create names that can be pronounced
Always document short names with translation tables

The author then provides some names to avoid:

Misleading names and abbreviations
Names with similar meanings
Similar Names with different meanings
Names that sound similar phonetically
Numerals in names
Misspelled words
Words that are commonly misspelled
Names with difference in just capitalization
Names of standard library routines and language keywords
Names with hard to read character (e.g. l and I (el and eye), 0 and O (zero and ooh), etc... )