Monday, January 17, 2011

Code Complete by Steve McConnell : Chapter 4

This chapter focuses on the steps to build a routine from scratch, specifically it describes PDL (Program Design Language)-to-code process.

Author enlists many advantages of using PDL:
  • When PDL is translated to code, PDL turns into programming-language comments which eliminates most commenting effort.
  • PDL makes reviews easier since only detailed design written in PDL needs to be reviewed and not the actual source code.
  • PDL supports the idea of iterative refinement.
  • PDL makes changes easier.
  • PDL is easier to maintain than other forms of design documentation since PDL remains along with the code and in sync with code-changes.
The author describes following steps in designing a routine:
  • Prerequisites: Before doing any work on the routine, the job of the routine should be clearly defined and fits cleanly into the overall architecture.
  • Problem definition: The problem the routine will solve should be stated in enough detail with regards to following:
    • Inputs to the routine
    • Outputs from the routine including the global variables affected
    • Information the routine will hide
    • How the routine will handle errors
  • Name the routine: A routine should have a clear, unambiguous name. If there is problem in coming up with a good name, it usually indicates that the purpose of the routine is not clear.
  • Decide on how to test the routine
  • Think about efficiency: The author specifies two categories:
    • If the system is not performance critical, the routine should be well modularized and is well readable. If it does not meet the performance criteria later, it can be replaced with a faster algorithm or more-efficient programming language. 
    • If the system is performance critical, the architecture should clearly specify the space and time constraints and the routine should follow them.
  • Research the algorithms and data structures:
  • Write the PDL: This can be divided in several parts
    • Write a concise statement of the purpose of the routine.
    • Fill-in the high level PDL for the routine.
  • Think about the data: If data manipulation is a prominent part of the routine, definitions of the key data structures are useful to have when you design the logic of the routine.
  • Check the PDL: Review the PDL and think of how to explain it to someone else. Make sure it is easily understandable of what the routine does and how it does it.
  • Iterate: Try different ideas in PDL before starting to code.
  • Write the routine declaration:  Write the routine interface statement.
  • Turn the PDL into high level comment and fill in the code below each comment: Each comment may translate into one or more lines of code.
  • Check the code informally: Check each block of code and try to think of what it would take to break the block. 
  • Clean up the leftovers:
    • Check the routine interface and make sure that all input and output data is accounted for and all the parameters are used.
    • Make sure routine does one thing and is loosely coupled with other routines.
    • Check for unused data, undeclared variables and inaccurate variable names.
    • Check for off-by one errors, improper nesting and infinite loops.
    • Make sure proper indentation has been used to clarify the logical structure of the routine, the expressions and parameter lists.
    • Make sure the PDL that was translated into comments is still accurate. Comments should provide algorithm descriptions, documentation on non-obvious dependencies and justification for unclear coding practices.
  • Check the routine: Try to mentally execute the whole routine and see if there are any errors.
  • Compile the routine: The author mentions about "Just One More Compile" syndrome where the programmer wants the routine to start working in just one more compilation effort. This leads to hasty, error-prone changes and takes more time in the long run.
  • Debug the routine: Once the routine compiles, put it into a debugger and step through each line of code to make sure each line executes as you expect it to.
  • Remove errors from the routine: If the routine is unusually buggy, don't patch it. Rewrite it. Creating an entirely new design for a buggy routine pays off.

Monday, January 10, 2011

Find Top N : Solution 2

The problem is described here. Solution 1 is here. Server-side Javascript in XPages in IBM Lotus Domino offers a function sort on an array.

Let S strings be stored in an Array names. Calling names.sort() returns sortedNames. Let topNames stores the top N strings with highest occurrence count found so far in descending order of occurrence and topCount keeps track of respective occurrence count. Following is the rest of the algorithm.

For each string s in sortedNames
Begin
   Let start,end store index of s in sortedNames
   Advance end till sortedNames[end] becomes different from s
   Set uniqueCount = end - start
   For each count c in topCount
      if uniqueCount is greater than c
         Begin
            move all the elements in topNames and topCount back by one
            insert uniqueCount in topCount and s in topNames
         End
   Advance sortedNames by uniqueCount
End

Since I don't have complexity numbers for Array.sort, I take the worst case and assume it to be of $O(S^2)$. After that the above algorithm is executed which, if all the strings are unique, take $O(S \times N)$ time. So the time complexity of this algorithm is $O(S^2)$. Space complexity is $O(S + N)$. Just remember that in both the solutions, the comparison of strings are counted. So the algorithms depend on not just the number of strings but on their length too.

Wednesday, January 5, 2011

Find Top N : Solution 1

The problem is described here. Server-side Javascript in XPages in IBM Lotus Domino offers a function @Unique.

Let S strings be stored in an Array names. Passing this to @Unique returns uniqueNames. Let topNames stores the top N strings with highest occurrence count found so far in descending order of occurrence and topCount keeps track of respective occurrence count. Following is the rest of the algorithm.

For each string u in uniqueNames
Begin
   Initialize uniqueCount to 0
   For each string n in names
      if n matches u, increase uniqueCount
   For each count c in topCount
      if uniqueCount is greater than c
      Begin
         move all the elements in topNames and topCount back by one
         insert uniqueCount in topCount and u in topNames
      End
End

Since I don't have complexity numbers for @Unique, I take the worst case and assume it to be of $ O(S^2) $. After that the above algorithm is executed which, if all the strings are unique, take $O(S \times (S+N))$ time. So the time complexity of this algorithm is $ O(S \times (S+N)) $. In most of the practical scenarios, $S >> N$, so the time complexity can be assumed to be $O(S^2)$. Space complexity is $O(S + N)$.

Tuesday, January 4, 2011

Find Top N

I am putting a problem here that I am currently working on:

You are given a list of S strings. Find the top N strings with the highest occurrence count from this list.

e.g. Let the list of strings be [ "xyz", "pqr", "abc", "xyz", "abc", "xyz", "lmn" ]
The top 2 strings are ["xyz" # 3, "abc" # 2] where number after the hash gives the count of occurrence.

I am working on the problem in server-side JavaScript that is used in XPages technology in IBM Lotus Domino. I already have found two solutions which I will discuss in next posts.

Updated: Solution 1, Solution 2 and Solution 3 (the optimal) are available.

Code Complete by Steve McConnell : Chapter 3

The author focuses on WIMP-Why isn't Marry Programming? or WISCA-Why Isn't Sam Coding Anything? syndrome where programmers do not understand the importance of preparation and prerequisites. There are two main reasons for this:

  • The programmers themselves can't resist the urge to begin coding
  • The managers are unsympathetic towards programmers who spend time on construction prerequisites.
The author enlists following prerequisites to be met before construction can be started:
  • Problem Definition Prerequisite: There should be a clear statement of the problem the system is supposed to solve. The problem definition should be in user language and is without any reference to possible solutions.
  • Requirements Prerequisite: Explicit requirements help achieve two things
    • End user can review and agree to requirements and the programmer has a clear view of what user wants.
    • They help avoid arguments.
    Requirements are never stable and a typical project does experience 25% change in requirements during development. The author also provides a check-list to ensure that the requirements are content-rich, high-quality and complete.
  • Architecture Prerequisite: The quality of the architecture determines the conceptual integrity of the system which in turn determines the ultimate quality of the system. An architecture should
    • describe the system in broad terms.
    • consider alternatives and provide reasons why one particular organization was chosen.
    • define major modules with well-defined interface.
    • show possible enhancements and that the effect of them is limited to small number of modules.
    • provide reasons for not using off-the-shelf components.
    • describe the major files, tables and data-structures to be used.
    • describe or reference algorithms used and provide reasons why alternatives were not used.
    • specify user interface.
    • specify memory management.
    • specify a strategy to consistently handle errors.
    • clearly specify the degree of robustness and fault-tolerance of the software and what will be the response of the software in case of critical errors.
    • provide a strategy how the software will handle internationalization.
    • provide estimates of speed and memory use of the software.
    The author also provides a list of issues a good architecture would address.
  • Programming Language Prerequisite: Before construction begins, programming conventions should be spelled out.