Tuesday, April 24, 2012

Create a word cloud from source code of an application

I am currently working at Druva Software. I wanted to create word-cloud from the source of the application I was working on. Here is the command I used:

find ./ -type f -print0 | xargs -0 cat | tr '[:space:]' '\n' | tr -c '[:alnum:]' '\n' | sort | uniq -c

This will generate a text file in a format " 'occurence count' 'word' " on each line. The top word would be empty space or new line which you can simply ignore. Since the command searches in binary files too, I got some non-relevant words too which I removed. I converted the file in comma-separated values and uploaded it to wordle and below is what I got. The application seem to be very 'self'ish.