The Siren Song of AWS

I have a deep respect for Amazon as a company. They have been consistently kicking ass for over a decade. They had a long-term vision and stuck to it while everyone else was out chasing the next buzzwordy fad. It worked. Moreover, they are not a one-trick-pony like Google so far (to be fair, the No Evil company's pony is more like a galloping flame-breathing dinosaur but still).

Amazon's second trick is of course AWS. The brilliance of AWS is not in that they figured out how to productize and make money with their existing private cloud infrastructure. Sure, they were the first serious player in the IaaS business but Google could have blown them out of the water had they chosen to. The genius move is this:

– Google App Engine forces you to do everything their way from day one. AWS lures you in with metal (servers, disk) and slowly creeps in on you with their multiple value-added services.

So it goes like this: let's check out AWS. Create an account, set up a Linux box, run your stack. You're the boss, you could switch anytime. So far so good. Say one day you want to use a key-value store and SimpleDB is there. It's simple, oh wait, it isn't. There are many subtleties that you discover as you scale. Next thing you know, your software is tightly coupled with SimpleDB. Migrating to, say, Rackspace becomes a serious project. Not only that, but SimpleDB is not cheap. Your bill starts going up, and so goes Amazon's stock.

TL;DR: AWS is doing an awesome job at luring you in and making you stay like a seductive siren. Use AWS as long as it works for you, but be ready to escape before you follow the fate of the unsuspecting sailors!

And now some music. 
[youtube http://www.youtube.com/watch?v=SBIeEjc7gaI?wmode=transparent]

Peggy

Las Vegas is not known as a beacon of honesty and trust in the world, let’s put it that way. I went there last weekend to attend a rock climbing festival, and I stayed at one of the well-known casino/hotel combos. I checked in to my room and I found this little card on the night stand:

Peggy

If you have ever waited tables, you know the marketing trick: if you say your name and create a “personal relationship” with patrons, you will likely receive bigger tips. Nobody agrees whether hotel room maids should be tipped or not, but clearly this is the same deal. This card is standard for the hotel, so the idea doesn’t come from the maids themselves. What called my attention here is the name: Peggy.

According to the awesome Baby Name Voyager tool, Peggy peaked as a name in the 1930s. I don’t know anyone young with that name. It’s hard to imagine a Peggy being a hotel room attendant in Las Vegas in 2011. Perhaps some market research shows that elderly couples are the most likely to leave tips, and they will be most sympathetic to someone with a name that suggests someone of their generation.

Screen_shot_2011-03-25_at_9

Sunday morning I checked out electronically and I was ready to leave the room. “Peggy” knocked on the door to see if she could come in and start cleaning the room. I immediately noticed her Spanish accent. Her name was Ana. I didn’t ask her how well the nom de guerre chosen by her employer is working out for her 🙂

How did you become a hardcore back-end developer?

I just saw this poston Hacker News, and I thought I’d write my answer here. Disclaimer: I don’t know if I am (or ever was) a hardcore back-end developer. I did make money for a long time doing web-scale development work for a bunch of companies.

 

I started coding in the 80s. No web, no connectivity for the most part, just simple games, random programs, hardware drivers, random utilities. Basic, Pascal, Assembly, C.

 

Fast forward to 1997. I’m doing my masters at Carnegie Mellon, enjoying the most awesome network connection. I download my first mp3, a novelty back then. It’s not even illegal, that will come later. A few months later the first mp3 search engines start to pop up. I think I can create one that’s better, and it becomes my personal summer project in 1998. After six weeks of nonstop hacking, I have 2look4.com running (check out the old snapshot, the domain hasn’t been mine since 2001).

 

So what was 2look4? A few things.

 

  • A CGI program written in C running under Apache that would process queries and call system(“/bin/grep”) over a file containing ftp links to mp3 files.
  • An ftp spider called by a cron job that would go through a list of known sites, list the mp3 files there and guess if the site had “unlimited access” or if it required an upload/download/ratio.
  • A submission form for new sites, a very basic poll, some javascript.
One day I put it out there and told some friends about it. The first day it had 100 queries. A week later, 10,000. My linear search with grep held up pretty well for the first few days, but as soon as I started getting concurrent queries it slowed down to a crawl. That’s the first time I saw a web-scale problem. Over the course of three nights I wrote a very simple inverted indexer and searcher to replace grep. It had no tf-idf or word positions. All you could do was an AND search (intersection between the sets of documents containing words), but it reduced the complexity from linear to almost logarithmic (binary search, basically). This was enough for the load of the server to drop and for the machine (a Pentium server) to run pretty much unattended for the next year. It would peak at around 200k queries per day sometime in 1999, before Napster came into the picture and killed ftp servers that shared mp3 files.

 

In the fall of ’98 I showed 2look4 to a former Carnegie Mellon professor who worked at Inktomi. Long story short, Inktomi acqui-hired me and my puny search engine software (not the site, which earned me some nice cash on the side from ads). I joined the crawling and indexing team and battled Google fiercely for the next four years and ultimately lost. We did a lot of cool stuff though. We grew from indexing 30M documents to 500M in a couple of years for example.

 

To answer the title question: I don’t think you “become a hardcore back-end developer”, you simply dip your toes in the water of terabytes and millions of requests per day. Try not to commit any algorithms with orders of complexity that don’t make sense for what you are doing. Pierce through abstraction layers and try to understand what the hardware is doing. Are we killing the network? The cpu? The disks? Do lots of back-of-the-envelope calculations. Write prototypes. Use load generator tools. Instrument code. Optimize carefully and only when necessary. Ask others who know more. It never ends, because technology changes fast. Game-changing technologies become affordable (cloud computing, SSDs, insane amounts of memory). It’s very humbling, because you think your system is as fast as it can be until someone comes along and makes it 10x faster.

 

If you are a software developer, try to work for a while on a web-scale problem. To me, it never gets old.

SSD vs HD on the Mac – Quick and Dirty Benchmark

Recently I switched from my old Macbook (first unibody version, late 2008) to the new Macbook Air. The main difference in terms of specs as far as I’m concerned is the disk, as the cpu and memory are virtually the same. The old Macbook had a 5400 rpm Fujitsu drive, the Macbook Air has an Apple TS256C SSD

Lately I’ve been interested in quantifying the performance of SSDs vs old hard drives to see if we can use them for IndexTank somehow. I still have the old Macbook lying around, so I ran the disk test with Xbench. Check out the results:

HD:

Hd_bench

SSD:

Ssd_bench

Because of the physical nature of SSDs (no moving parts, lower seek times) I expected a significant difference in random read and write times, especially for small chunks. The test confirms this, the most significant difference is in Uncached Reads for 4k blocks (close to 25x improvement) and Uncached Writes for 4k blocks (almost 20x).

Obvious conclusion: as much as SSDs are a nice-to-have for personal computers, they will make a huge difference for cloud applications. For many applications they offer a speed that starts to approach what you can do with memory, but at a tiny fraction of the cost and with permanent storage as a bonus. I can’t wait to see SSDs in the cloud.

But nobody rocks like… [your town]

Yesterday I saw this conversation on Hacker News about a post titled "Can Montreal Become an Open Source Startup Hub?"

Technology ecosystems – most business markets, actually — have network effects. And that means that the only rank to have, as an ecosystem, is first place. Best in the world.

 The post makes the case that Montreal cannot compete as a web startup with the Bay Area, or even for second place with New York. What's left? Picking a niche and dominating it. The author further argues that for Montreal this is Open Source software startups. Maybe he's right, although I have my doubts. What the post did is make me think about other cities in the world with vibrant startup communities and large pools of engineering talent. Buenos Aires is the first example that comes to my mind, because it's my hometown. It could be Sao Paulo, Madrid, London, pick your favorite major city.

So what's special about Buenos Aires? Could the cradle of Tango become a hub for any kind of software startups? On one hand, Buenos Aires has an lots of bright software developers. On the other hand, Argentinian culture (just like that of most countries in the world, perhaps) is very risk-averse when compared to the US. That can be overcome with time and examples of successful companies and will encourage others to take more risks. There are a few already (Mercado Libre, even though it's an outlier of sorts). On the other hand, one thing the US does well is concentrating different industries or activities in different hubs:

  • San Francisco Bay Area – startups
  • Hollywood – film
  • New York – finance
  • Detroit – car industry (disregard the current state of affairs)
  • Cleveland – well… 
Buenos Aires is a very diverse city, and it has so much going on. You won't see very many waitresses waiting for their big break or overhear many nerds discussing their programming language of choice unless you know where to look. This diversity reduces the serendipity of running into opportunities in a small world like the Bay Area. Also, it makes a talented software developer feel more isolated and willing to emigrate and try to compete in the big leagues. Essentially, I doubt Buenos Aires or any other city can quickly become a hub for any kind of niche software startups mostly because it will have a hard time retaining the necessary talent. Sure, there will be some startups that will do fine but I don't expect the next Google or Facebook to pop up there. What will happen for sure is that startup gurus will visit the city on tour and tell everyone willing to hear them that it's possible for them to pick a specific niche and rule it. Casual gaming perhaps? Latin-American adaptations of concepts that worked in the US such as eBay or Groupon? It's a bit of a consolation prize for the world-class brains who, if born in California, might have created the next Apple. For the time being, the Big League is in the Bay Area.

BTW, the title of this post is a reference to The Simpsons, Episode 22 of season 3.

Nigel Tufnel: [addressing the crowd] We were told they knew how to rock in Shelbyville. 
[the crowd 'boos'
Derek Smalls: But nobody rocks like… 
[looks on the back of his guitar where he has placed a reminder of the name of the town they're playing in
Derek Smalls: Springfield!

Lamabot (fortune cookies and Markov chains)

If you are on Twitter, there is a chance you follow the Dalai Lama or have seen one of his cogitations retweeted by his ever-growing following. If you're like me you may think he actually runs a factory of fortune cookies, or that he wrote an advanced phrase regurgitation algorithm.

So I was at the office yesterday, Friday afternoon. It was a long week, and I had the idea of downloading His Holiness' most recent 200 tweets to play with. If you studied computer science or engineering you may be familiar with the concept of Markov Processes. To explain it in a simple way, a Markov process is a series of states in which each state depends only on the previous one. Not only that, but it's random. A very basic one: follow a road until the next fork. When you get there, pick one path at random. It doesn't matter how you got to that fork, but given that you are there it's possible to make statistical predictions about where you'll be later.

The idea of using Markov chains to generate random but semi-coherent text has been around for a long time. The gist of it is to analyze some text, find groups of words that are likely to be together and generate chains of text of arbitrary length attaching a group to the next (last word of a group is the first of the next). It's not hard to code this, but I was ready to play with the tweets and wanted to get going. I did a quick search for Markov chains in Ruby and found this blog post with some code ready to use.

Five minutes and ten lines of code later, Lamabot was online thanks to Sinatra and Heroku (making web apps has never been easier). Yes, it's crude and it could be improved to be much more realistic. Some Dalai-esque pearls of wisdom that came from it:

"We need self-confidence and therefore ethical conduct for the world is the Sunday edition of envy and love." (he's been listening to the Arcade Fire?)

"We need to be possible as a warm heart, warm heart." (I don't know, James Blunt?)

"We need self-confidence and recent scientific findings." (yeah, I agree, making some sense here)

"We can't be fearful of the needs of universal dimension." (I want what's he's having)

"We must use violence, you may neglect our responsibilities toward each other differences, all beings must create peace." (yeah, rock'n'roll, let's break some chairs and then be friends again)

And my favorite, nobody could argue with this one:

"We cannot overcome anger and bears with the confines of World Summit."

Why Non-engineers Think Engineers Are Better Off Joining Startups

Generalizations almost always have exceptions. In this case, the exception is perhaps the majority of instances. If you are going to make the claim that the best option for an engineer is to join a startup, you’d better have compelling arguments.

I just read Why Engineers Are Better Off Joining Startups by Bindu Reddy on Techcrunch and found a couple of “jewels”:

“All this has caused a severe shortage of good engineering talent. Which is why, the time has never been better to work at a startup.”

“More importantly, the one thing that every passionate engineer cares about—the ability to build and ship products—is harder at large companies.”

As an engineer, it’s easy to destroy this kind of logic. The first sentence says that because of a shortage of engineering talent, the time has never been better to work at a startup. Well, for one you cannot make decisions in a time other than the present. If you are looking for a job, the only time that matters is now. A current shortage of engineering talent may affect the decision of someone who is choosing a career, hoping the shortage will persist in time. If you are already an engineer, it’s a case of the rising tide lifting all boats. Higher demand means that you will get a better salary either at a startup or a large company. I know that in the Bay Area companies like Oracle are paying very high salaries to attract talent who otherwise would go to more interesting places. This is why many of my friends went to work for HP rather than startups in 1998. A shortage of engineers does not magically make it more attractive to go to a startup than to a large company.

The second sentence is even easier to deal with. There is no “one thing” that every passionate engineer cares about. Most likely passionate engineers do care about building and shipping products, but we also care about being healthy, having families, traveling and doing other human things that non-engineers also enjoy. Some people just cannot deal with the lifestyle of a startup for external reasons. Even if you decide to make all sacrifices because you really want to be involved in the building and shipping of a product, there are a few things you must know:

  • Startups change all the time. You will work on dead ends, and it will be frustrating.
  • You may have to take off you engineer hat. Sometimes the company needs customer support. Or extra marketing, or sysadmin/operations. Or sales. Sometimes you have to first sell things you don’t have, and then build a crappy, I-hope-nobody-sees-this-code prototype.
  • Your skills may become irrelevant if the company pivots. Say you are an expert in information retrieval and the company morphs from a technology vendor to a user destination site. You will be asked to make the website prettier, to optimize load speeds or to write blog posts. 

In contrast, at a large company:

  • You will have stability. If you are hired by Oracle or Google to improve the garbage collection algorithms of the JVM or optimize where to display certain types of ads, chances are you’ll be able to work on this specific problem for long enough to accomplish something.
  • You will work reasonable hours that will allow you try being a “well-rounded person.”
  • You won’t have a lot of impact on whether the company lives or dies. Whether this is good or bad depends on your personality.

I am the CEO of a small startup and I haven’t worked for a large company since the 90s. I love what I do because it’s a job, a hobby and a game at the same time. Our employees enjoy the game too, but it would *never* cross my mind to make a blanket statement that all engineers should work at startups. Sometimes I interview people to whom I plainly tell that I would not be looking for a startup job if I were them.

If you are a smart engineer, don’t believe the hype. Do what you’ve been trained to do: gather information, weigh the pros and cons, find the compromise that works for you and decide for yourself.

Ah, Southwest…

At the Southwest SFO check-in counter a few minutes ago. The desk guy tells me that my bouldering pads are oversized: each one is 67 linear inches (width, length and height added together) and the maximum allowed is 62. He'll have to charge me $50 per item. I smile, beg for mercy, he won't budge. Ok, you got me.

I give him my credit card, he takes forever to process it. In the meantime I check my email with my phone, almost a reflex at this point.

Guy: "Sir, by any chance did you take my picture?"
Me: "What? No, I was checking email."
Guy: "Ok, because some people take our picture and that's illegal."
Me: "(speechless)"

Of course, Guy never told me that I could have simply tied both my pads together and paid $50 for one oversized item instead of $100. I bet he gets two "employee of the month" credits for making Southwest $100 (and deeply annoying a frequent flyer).

About the illegality of taking his picture: WTF.

Double Sunset

This weekend I flew from San Diego to San Francisco on Southwest Airlines. Thanks to an early check-in I was able to grab a window seat on the left side of the plane, facing the ocean when flying north. My flight took shortly after sunset, and as the plane climbed the sun rose again. I saw a second sunset and felt like the Little Prince from Saint-Exupéry’s book. This is the second sunset, over the Channel Islands.

Sunset

The Upside of Irrationality

Just finished reading The Upside of Irrationality by Dan Ariely (highly recommended, just like Predictably Irrational). One take-away from this book: if you thought making decisions while under the influence of strong emotions was a bad idea, now there is hard data to back that up. It's worse than you thought, because those decisions affect your future decision-making patterns.

Side note: this book was a pleasure to read on the iPad.