Thursday, February 7, 2013

Design NoSQL tables for Activity Stream

Most of the social networks now-a-days have activity stream, a stream of events for a particular object with very large number of such objects. Facebook has timeline, twitter has its updates, etc... The company I am working in came up with a product similar to Dropbox and we were supposed to implement a similar design pattern in our application. In the future posts I will discuss several designs and compare each of them based on following criteria:

  • Correctness
  • Access Time
  • Storage requirements
See that every object's activity stream can be subscribed. In our application, every shared folder has an activity stream. When a folder is shared with multiple users, every one of them needs to get update on that shared folder. When the user is removed from collaborating on a shared folder, he should stop getting updates on that folder. Joining and leaving a shared folder also constitutes activities in the stream.

Tuesday, April 24, 2012

Create a word cloud from source code of an application

I am currently working at Druva Software. I wanted to create word-cloud from the source of the application I was working on. Here is the command I used:

find ./ -type f -print0 | xargs -0 cat | tr '[:space:]' '\n' | tr -c '[:alnum:]' '\n' | sort | uniq -c

This will generate a text file in a format " 'occurence count' 'word' " on each line. The top word would be empty space or new line which you can simply ignore. Since the command searches in binary files too, I got some non-relevant words too which I removed. I converted the file in comma-separated values and uploaded it to wordle and below is what I got. The application seem to be very 'self'ish.





Monday, November 7, 2011

Find formula for coordinates : Solution

The problem statement is here. Observe that the $ (x, y) $ coordinate appears in the $ (x + y) ^{th} $ solid line. The $ n^{th} $ diagonal has $ 1 + 2 + 3 + ... + n $ elements before it, i.e. $ (x + y) ^{th} $ diagonal has $ \frac{(x+y)(x+y+1)}{2} $ elements before it. Add the $ x $ elements to it to get the point at coordinate $ (x+y) $. So the answer is $ \frac{(x+y)(x+y+1)}{2} + x $.

Friday, November 4, 2011

IIT Delhi MTech Interview Question: Find formula for coordinates

On a grid, the numbering is done as shown in the image below. Dotted lines go up and solid lines go down.
Find out the formula to describe number on grid given the standard $ (x, y) $ coordinate. Credit goes to my colleague Jignesh Parsana for the problem.

Update: I found that this is Cantor's Pairing Function as given at http://szudzik.com/ElegantPairing.pdf

Saturday, October 15, 2011

Design a cache replacement policy for a hard-disk if the stream of sector requests is already known : Solution 1

Problem statement is here.

For the first $ k $ requests, all the sectors are cached. For every request $ s_i, k < i <= R $, we have $ k + 1 $ sectors to put in $ k $ cache location. To discard one of the sectors, we can give weight to every sector which is equal to the difference between the index of next request for that sector and the index of the current request. So if the next request for sector $ m $ is appearing at index $ j $ then the weight of that sector is $ j - i $. Choose the $ k $ sectors with the minimum weight.

Google Interview Question : Design a cache replacement policy for a hard-disk if the stream of sector requests is already known

Problem statement: You are given a hard-disk with large number of sectors say $ N $ and that has a facility to store $ k $-sectors in cache such that read time for cache is very less compared to the time to read sector from a hard-disk and $ N >> k $. Assume the seek time is negligible compared to read-time. You are given a stream of requests $ s_i \text{ such that } 0 <= i <= R, k << R << N $.

Sunday, September 11, 2011

Coder to Developer: Tools and Strategies for Delivering Your Software by Mike Gunderloy : Chapter 11



The author discusses about working in team and the reasons for doing so such as:
  • Large projects with thousands of classes cannot be handled by a single person
  • Creating a whole application by a single person may miss the deadline
  • Creating a software involves not just developing an application but documentation, help files, installer, graphics, a website and many more things so it is better to have a team with different skills
The author then discusses about team management and the 'soft' issues involved in managing the team such as:
  • Choosing a structure: There are two alternatives to deciding a team structure
    • Community of equals: Each member on the team is recognised of being equally capable.
    • Hierarchy:
  • Tracking progress: The author suggests email-notification tool from source code control module or web page refresh from build tools. The author suggests 3 x 3 approach where each member regularly sends a report to other members of the team or just the project manager containing:
    • Three accomplishments for the previous week
    • Three obstacles to progress from the previous week
    • Three plans for the coming week
  • Avoiding your incompetence: The author discusses the Peter Principle, i.e. In a hierarchy, every employee tends to rise to his or her level of incompetence. In most organizations, people are promoted when they do their job well but never demoted. The author gives following tips to become a competent manager:
    • Start with a small team
    • Spend some time on management chores
    • Pay attention to everybody's ideas
    • Read literature on software project management
The author then discusses tools for distributed teams such as:
  • E-mail:
    • Use spam protection
    • Keep project related emails organized
    • Read emails in a batch
    • Instead of sharing every email with everyone, try sharing items through public folders or a news server
  • Instant Messages:
    • Read messages in a batch
    • Keep everybody on the same service
    • Keep an archive of conversations
  • Online Workspaces: The author discusses use of online workspaces like GotDotNet or SourceForge and their pros and cons.
  • Wikis: The author provides several wikis available and what their benefits and disadvantages are.
  • Microsoft Sharepoint: The author discusses advantages and disadvantages of Microsoft sharepoint and groove.
  • Programmer's tools: The author discusses CodeWright and CodeReview.