Peggy

Las Vegas is not known as a beacon of honesty and trust in the world, let’s put it that way. I went there last weekend to attend a rock climbing festival, and I stayed at one of the well-known casino/hotel combos. I checked in to my room and I found this little card on the night stand:

Peggy

If you have ever waited tables, you know the marketing trick: if you say your name and create a “personal relationship” with patrons, you will likely receive bigger tips. Nobody agrees whether hotel room maids should be tipped or not, but clearly this is the same deal. This card is standard for the hotel, so the idea doesn’t come from the maids themselves. What called my attention here is the name: Peggy.

According to the awesome Baby Name Voyager tool, Peggy peaked as a name in the 1930s. I don’t know anyone young with that name. It’s hard to imagine a Peggy being a hotel room attendant in Las Vegas in 2011. Perhaps some market research shows that elderly couples are the most likely to leave tips, and they will be most sympathetic to someone with a name that suggests someone of their generation.

Screen_shot_2011-03-25_at_9

Sunday morning I checked out electronically and I was ready to leave the room. “Peggy” knocked on the door to see if she could come in and start cleaning the room. I immediately noticed her Spanish accent. Her name was Ana. I didn’t ask her how well the nom de guerre chosen by her employer is working out for her 🙂

How did you become a hardcore back-end developer?

I just saw this poston Hacker News, and I thought I’d write my answer here. Disclaimer: I don’t know if I am (or ever was) a hardcore back-end developer. I did make money for a long time doing web-scale development work for a bunch of companies.

 

I started coding in the 80s. No web, no connectivity for the most part, just simple games, random programs, hardware drivers, random utilities. Basic, Pascal, Assembly, C.

 

Fast forward to 1997. I’m doing my masters at Carnegie Mellon, enjoying the most awesome network connection. I download my first mp3, a novelty back then. It’s not even illegal, that will come later. A few months later the first mp3 search engines start to pop up. I think I can create one that’s better, and it becomes my personal summer project in 1998. After six weeks of nonstop hacking, I have 2look4.com running (check out the old snapshot, the domain hasn’t been mine since 2001).

 

So what was 2look4? A few things.

 

  • A CGI program written in C running under Apache that would process queries and call system(“/bin/grep”) over a file containing ftp links to mp3 files.
  • An ftp spider called by a cron job that would go through a list of known sites, list the mp3 files there and guess if the site had “unlimited access” or if it required an upload/download/ratio.
  • A submission form for new sites, a very basic poll, some javascript.
One day I put it out there and told some friends about it. The first day it had 100 queries. A week later, 10,000. My linear search with grep held up pretty well for the first few days, but as soon as I started getting concurrent queries it slowed down to a crawl. That’s the first time I saw a web-scale problem. Over the course of three nights I wrote a very simple inverted indexer and searcher to replace grep. It had no tf-idf or word positions. All you could do was an AND search (intersection between the sets of documents containing words), but it reduced the complexity from linear to almost logarithmic (binary search, basically). This was enough for the load of the server to drop and for the machine (a Pentium server) to run pretty much unattended for the next year. It would peak at around 200k queries per day sometime in 1999, before Napster came into the picture and killed ftp servers that shared mp3 files.

 

In the fall of ’98 I showed 2look4 to a former Carnegie Mellon professor who worked at Inktomi. Long story short, Inktomi acqui-hired me and my puny search engine software (not the site, which earned me some nice cash on the side from ads). I joined the crawling and indexing team and battled Google fiercely for the next four years and ultimately lost. We did a lot of cool stuff though. We grew from indexing 30M documents to 500M in a couple of years for example.

 

To answer the title question: I don’t think you “become a hardcore back-end developer”, you simply dip your toes in the water of terabytes and millions of requests per day. Try not to commit any algorithms with orders of complexity that don’t make sense for what you are doing. Pierce through abstraction layers and try to understand what the hardware is doing. Are we killing the network? The cpu? The disks? Do lots of back-of-the-envelope calculations. Write prototypes. Use load generator tools. Instrument code. Optimize carefully and only when necessary. Ask others who know more. It never ends, because technology changes fast. Game-changing technologies become affordable (cloud computing, SSDs, insane amounts of memory). It’s very humbling, because you think your system is as fast as it can be until someone comes along and makes it 10x faster.

 

If you are a software developer, try to work for a while on a web-scale problem. To me, it never gets old.