Software Development Estimates, Where Do I Start?

For some reason many people discuss the problem of estimating software development timeframes without properly understanding the issue. There is a famous answer on Quora that exemplifies this. Lots of people like that story, even though it’s inaccurate and misguided. “Software development” is such a huge endeavor that it doesn’t even make sense to talk about estimates without an understanding of the kinds of problems software can solve. To put this in context, let’s forget software for a second and look at a few tangible problems of different magnitudes.

  • You are a medical researcher. A new disease makes headlines. It’s a virus. It seems to be spreading through sexual contact. How long is it going to take you to find a cure?
  • It’s 1907, and you’ve just built the first airplane. The government wants you to build a spaceship to fly to the moon. How long will it take?
  • You are in charge of a construction company that has built hundreds of buildings in your metropolitan area. I want you to build me a twenty-story apartment complex very similar to the one you just finished on the other side of town. When will it be done?
  • You own a chair factory that produces 1000 chairs per month. I need 3500 chairs. Can you make them in a month?

The four questions above are radically different in nature. The first two involve significant unknowns, and require scientific or technological breakthroughs. The other two, not so much. Meta-question: does software development look like the first or the second kind? Another meta-question: what kind of software development are we talking about?

Let’s focus on the construction company. People have been constructing buildings for centuries. There’s relatively little variation in the effort and costs of putting up vanilla high-rises (we’re not talking about Burj Khalifa). Of course there is uncertainty: economic conditions could change, suppliers could go out of business. The new mayor could have a personal vendetta against your company because your big brother bullied him in school. All those things have happened before, perhaps in combination. Let’s say you can give me an estimate of 15 to 24 months with 98% confidence based on past data. Sounds good to me.

If buildings were software you could break out a template, customize it a bit, install it on my plot of land in the cloud, the end. There are companies doing this for software, for example Bitnami (disclosure: I’m an investor). The process is so quick that you don’t even need to ask for a time estimate. You just see it happen in real time.

Let’s imagine that it were impossible to clone software at almost zero cost, like it is for physical things. Most software developers would be like monks copying manuscripts. If you have been handwriting pages for long enough, you can confidently tell me that it will take you at least 15 years to produce a high-quality copy of the entire Bible (it can be done 4x faster today, not sure about the quality though). You could get sick, or suffer interruptions. However, the number of absolute hours you’ll need is well known. There is a type of software development that works like this: porting old applications to a new language (say COBOL to Java back in the day).

Of course, the more rewarding problems in software development look nothing like this. I enjoy trying to solve problems that nobody has solved before. The malleability of software makes it easy to explore an open problem. Some problems are deceiving; at first they may look like building a house, and as you discover unknowns they sometimes mutate to resemble a quest for an AIDS vaccine. If a problem is solvable with software, it may take weeks or months to come up with an imperfect solution. It will probably take years to build one that’s scalable and robust. The meta-problem is that some problems cannot be solved with software, or at least not yet, or not by me / my team / my company. I might give you an estimate that would look like:

  • less than a month: 30% chance
  • less than a year: 40% chance
  • never: 30% chance

Another kind of software development somewhere in the middle, and it may be the one that generates the most software jobs. Usually an organization wants a solution to a problem that has already been solved by others (e.g. building a cluster manager for a social network graph). Even though you don’t have access to the design or the code, you have an idea of what the solutions look like. You don’t know what key issues they ran into, how good the teams were, how lucky they got. Still, you know the problem can be solved by companies that look like yours for a reasonable cost. There is still quite a bit of uncertainty:  you can estimate small tasks reasonably well, but you cannot predict which “week-long” task might expand to several months (e.g. it turns out that no open source tool solves X in a way that works for us, we’ll have to write our own).

The gist of why estimates are hard: every new piece of software is a machine that has never been built before. The process of describing how the machine works is the same as building the machine. The more your machine looks like existing ones, the easier it is to estimate its difficulty. Of course it won’t be exactly the same machine; that can happen with a chair but not with software. On the other hand, you may want to boldly build what no one has built before. In that case, you’ll most likely adjust your scope so that you can build something that makes sense for the timeframes you work with. The solution to the original problem might take iterations over generations. Not necessarily generations of humans, perhaps generations of product versions, teams or even companies. You may set out to put a man on the moon, and your contribution would be the first airplane.

Discuss on Hacker News

Uber for Everything

San Francisco has 800,000 inhabitants. How many cordless drills are there in this city? Probably orders of magnitude more than we actually need. I bought one six months ago, used it just once. It’s now another object in a box filled with stuff. At the time I must have thought that the most convenient way to hang a shelf on the wall was to buy a drill for $22 with free shipping with Amazon Prime.

What I really needed were the holes, not the drill. Maybe I should put it out in the street and forget that it existed. What if for $22 I could have had a reliable person show up at my place within an hour, drill the holes, and go away? I’d probably do that every time I needed holes (maybe once a year, no idea). I know I could find someone on TaskRabbit to do it, but is it as easy as buying a drill on Amazon?


I know this is a silly example of a First World problem; that’s not the point. What’s interesting to me is how typical households in the developed world contain caches of random objects that we use with varying frequencies. You probably use your toothbrush at least once a day (let’s hope). How about other stuff in your bathroom? What’s in your closet, or in your garage? Perhaps you have a tennis racket that you bought ten years ago when Rafa Nadal was still an unknown. He’s gone through hundreds of rackets since, while yours sat idle next to your mother’s old dining set, on top of a case containing a $300 guitar that you played for a week, inspired by Slash’s performance during a Guns’N’Roses concert you saw. Then the band fell apart, and Axl spent ten years “working” on an album called Chinese Democracy that few people remember. While far from their best work, Chinese Democracy is way better than, say, Liz Phair’s Funstyle. But that’s not important right now, let’s get back to your neglected guitar.

Why did you spend $300 on a guitar? It probably seemed like the best option at the time given the alternatives you had. You probably couldn’t borrow one from a friend, and you thought there was a good chance you’d use it for a long time. It seemed justified. We humans are pretty bad at predicting the future, and sometimes that’s very costly. On a much larger scale, those of us who live in California know how this state embraced the car/freeway combo during the twentieth century. The state was developed during the short window of time when cars and freeways seemed like the solution to all transportation problems. Now we are stuck with an inefficient transportation system, and we need to own our private cars to drive on public freeways.

What if we had to design the United States transportation system from scratch today? With today’s technology, perhaps we’d want public roads and public cars. It might work like this:

You need to go from A to B, so you walk outside. There are a bunch of cars parked within a minute of your doorstep. They all more ore less indistinguishable, like parking meters or traffic lights. You pull out your phone, click on the “car” icon, see the lights of a silver sedan blink. You drive it to B, and you park it somewhere. Your app charges you a toll for the trip. That exact car probably won’t be there when you get back, so you have to take your stuff with you. Perhaps you have a standard robot trunk that fits into all cars. It follows you around when you walk, and inserts itself into the car you drive. Another robot goes around refueling cars. Cars that break down are mysteriously repaired at night. In this imaginary country, owning a car makes as much sense as owning a road.

Of course I’m not suggesting that we build the above system (I’d prefer self-driving Segways). All I’m saying is that we have the technology to do it if we wanted. In fact, let’s forget cars. What other kinds of objects that we own could be replaced by services? I can imagine startups taking advantage of niche opportunities the same way ride-sharing services like Uber and Lyft are disrupting the taxi business. Could the “drill-me-a-hole” app become a billion dollar startup? Perhaps not, but what are the conditions for an object-replaced-by-a-service to be a viable business? Here are a few:

  • Latency: if I need a cab, I don’t want it tomorrow. It’s reasonable to wait for ten minutes, but an hour might be too much. For a hole in the wall, I could wait until tomorrow. What about owning a dinner set for 12 people in case we have guests over? I may want to schedule it to show up Friday at 5 pm, as well as a dirty dish pick up tomorrow after 11.
  • Liability: what if I make my drill available for peer-to-peer rental, and the next person to use it breaks it? What if my drill is used to commit a crime?
  • Liquidity: what if I request a dinner set at 4 pm, but there are none available until tomorrow? What if I want a relatively rare object of which only five exist in San Francisco?
  • Peer-to-peer (“AirBnB for drills”) or centralized (“Zipcar for blenders”)?
  •  Cost-effectiveness: could someone put a drill in my hands in the next hour, and pick it up tomorrow morning for less than it costs Amazon to deliver one in 48 hours (and never get it back)?

There must be lots of things for which new “sharing economy” and “unusual things as a service” startups could figure out the operational details. Uber, Lyft, TaskRabbit, Zipcar, that’s just the beginning. Imagine the free space and extra money you’d have if you could have a vacuum cleaner at your place in 30 minutes and gone an hour later, two extra chairs for the weekend, an air mattress for a week, a barbecue for six hours on Labor Day, a Bigfoot Garden Yeti for… well, never?

Discuss on Hacker News

Why Search Is Hard

Yesterday I was reading this Hacker News thread about the Common Crawl, and one comment caught my attention:

Common Crawl is awesome. I wonder how complex it would be to run a Google-like frontend on top of it, and how good the results would be after a couple days of hacking…

I wondered why it wasn’t obvious to this person that it would be very complex to do that, and that the result would not be great. I realized that many people who search Google in 2013 are too young to remember the early days of web search (I’ll use Google as an example for this post, replace with your favorite search engine if you’d like). There was one time when building a decent search engine was relatively simple:

  • crawl a few million pages (in 1997, that would have included pretty much every interesting web page out there).
  • create an index in which every word points to all the web pages containing it, just like one at the end of a textbook.
  • Write a cgi script to parse a query from a search box into separate words, find the words in the index, compute the intersection of all the pages, create a list of (at most) 10 results.
  • Render those results as ten blue links on a white page. Include the search box in that page in case the user wasn’t satisfied with the (probably crappy) results we just provided.

Fast forward 16 years. Today’s search engines are no longer about the web. They are about reading people’s minds, and finding answers to questions we cannot even articulate properly. A naive search engine built using the Common Crawl data on top of 1990s technology would feel clunky and tedious, like riding a horse for a ten-mile commute. If you don’t believe me, just go to Google (either on your computer or your phone) and start typing a few characters. Pay attention to all the things that happen:

  • Google fixes spelling mistakes as you type. Every keystroke is sent back to a server that tries to help you as quickly as possible. What does this person want? How much information do I have so far? Should I suggest searches, or should I display results? What device is this person using, and what does the screen look like?
  • Google knows who you are. On Google, you’re searching your own personal universe. If you and I start typing a first name (J-o-h), odds are it will show us people we know named John, along with their pictures. They probably won’t be the same people.
  • The web is only a small fraction of your information needs. It’s been a long time since search engines restricted themselves to finding web pages. Today they will do math for you, find real-time flight information, tell you about the local weather, show you when the next bus will come by your stop. If you type an artist or a song title, you’ll probably see a video or two. Back in 2007 Google called this Universal Search, but the concept had been around for years. In 2000-2001 we were already working on blending different types of results at Inktomi.

A mainstream search engine acts a bit like a very diligent psychic. It has performed a background check on you. It knows more about you than you can probably imagine, including patterns you are not even aware of. It uses that information plus a number of large data sources (of which the web is just one) to guess what you want. To make matters more complicated, the usual input methods don’t let you articulate your desires as if you were talking to your personal search genie:

“Hey Mysterious Finder of Things, I’m looking for an article that I read sometime last week. There was a phrase that was really funny, it had something to do with George Carlin. Or maybe Louis CK, I’m not sure.”

Instead, you’d go to Google and start typing something like “George Carlin article.” Perhaps you’d do it directly from your browser bar without even noticing that you’re searching Google. As you type, you might see the url and title of the article you read 6 days ago. If you don’t, you could actually pick a suggested query, submit the search and refine the results. How often do you do that? If you see really old results, you could tell Google to restrict the search to articles from the past week. How do you think Google knows that an article is from the past week, or from the past year? It’s not as easy as it seems; Google knows when it first saw an article. What if the article was created ten years ago but only made visible to Google last week?

I could keep going, and make this into a book about the immense amount of work that has taken search to the current state of the art. I’ll stop here. The point of this post was to show how even tech-savvy people take for granted the extremely complex mechanisms that power a search engine. It may be possible to make some investors believe that you can build a usable alternative to Google in a year, but anybody who calls him/herself a hacker should know better.

Discuss on Hacker News

Open-source Something Often

If you write code for a living, when was the last time you released something as open source? If you can’t remember, I’d hold that against you in an interview. Why? Assuming you take pride in your work (if you don’t… well), open-source code is an incentive to:

  • Make sure the code is not horrendously embarrassing.
  • Verify that a random person can check it out and make it work.
  • Explain to others why (and maybe how) your code works.

You may be a star coder at some top-notch Silicon Valley darling, and you may work on code that has been working flawlessly for years. Whether you like it or not, the more obscure and undocumented your code, the more the company depends on you. Your peers may quickly review your code every so often (ship it! ship it good!), but that doesn’t mean anybody else knows what’s going on. Some companies like LinkedIn understand this, and make it known that your code could (and probably will) be open-sourced one of these days.

I’m writing this post because this weekend made me remember the three bullet points I mentioned, when I open-sourced my code to generate word clouds from Twitter searches.

I wrote this stuff as I was learning Clojure last year. I wanted to create word clouds such as the ones on Wordle. I discovered a great Processing-based library called WordCram, and I wrote some code around it to fetch and render text from a variety of sources: Twitter timelines, searches, the global timeline, rss feeds, etc. Here’s an example of a word cloud for snowden early this morning:


I thought it would be really easy to release this as open source, but it turned out to be a fair amount of work for what’s barely over 100 lines of Clojure. Here are some of the issues:

1) As I decided to release the code, I tested it against Twitter and it didn’t work. Why? Because Twitter had deprecated the search api supported by clojure-twitter, based on pagination. I had to replace it with twitter-api, and adapt my code to work with timelines.

2) I realized that my config.clj file had my private Twitter app credentials. I had to add it to .gitignore so I don’t commit it by mistake, and I had to do it before adding the repo to github because otherwise it would be part of the history. I had to create a dummy config.sample file, and explain in my how to obtain/insert your credentials.

3) I still have to document my code and make it more idiomatic, but I don’t care that much about it because I’m not looking for a job as a coder. I’ll do it at some point, or maybe someone will tell me how much I suck in a pull request 🙂

4) I thought I was done, so I created the github repo. I did a fresh check out, and it worked. I tweeted about it, and I ask a friend to see if it worked for him. Of course it did not work. Why? This is where the fun begins.

I had created the project with leiningen 1.7, and I’d had to create a local maven repository because WordCram and other needed libraries are not in any public ones. I’d followed the instructions on this post from late 2011. I had since upgraded to leiningen 2.1.3, for which this recipe doesn’t work anymore. The problem is that the .m2 directory under my home already had all the dependencies it needed, so a fresh build did not need to fetch anything from the maven repo. Sure enough, I moved .m2 and then everything broke.

A search on Google took me here, which was the start of the solution. That almost worked, except for the fact that it created a slightly different repository structure than the one leiningen expected for my four artifacts. I had to recreate the maven repo with a script that gave a different groupId to each artifact, and then my friend sgrove was able to make the code compile and run.

I still wasn’t done. The first query he tried was somewhat obscure, and a call with 0 results broke my program (not enough arguments to the reduce function).

Moral of the story: I’ve been coding for 30 years, and look at all the stuff I had to learn in order to open source what’s essentially a puny script. Humbling and inspiring at the same time.

What have you open-sourced for me lately?

Another Silly Startup Analogy

When I was a teenager in Argentina in the 1980s, there was a weekly TV show called “Feliz Domingo” (Happy Sunday). It ran live for nine hours, between one and ten pm. It was a game show for high school students near graduation, who competed in events involving different skills. A typical event would have four or five individual students (or small teams), each representing a 30-40 person group from a graduating class. Students competed on national history trivia, blindfolded obstacle courses, performance art, etc. Some of the events were somewhat weird, it’s not easy to fill up nine hours worth of live TV (there were also live musical performances by all sorts of local bands such as Los Fabulosos Cadillacs, but I digress).

One of the coolest events was about memory and diction. You’d get a random trivia category such as Greek Philosophers, and you’d have to name as many as possible in ten seconds, no duplicates. I remember one girl who had memorized 25 items for each of 30 categories, and was capable of intelligibly reciting any 25 items alphabetically in 10 seconds. That talent earned her school one of 20 or so spots in the “final round.”

The final round went like this: there would be a locked “coffer” (El cofre de la felicidad) containing the grand prize: enough cash to send the contestant’s group (40 to 50 people) on a week-long trip to a ski town by the Andes. The host would put the key into a plastic cylinder and mix it with other similar-looking keys that wouldn’t open anything. There would be one key per contestant. Contestants lined up according to what event they’d won. The host would ask the next contestant in line his or her name, school, and number of people in the group. He/she would walk up to the cylinder and pick a random key. There would be five seconds of suspense while the boy or girl jiggled the key as it failed to open the lock. The process would repeat until one lucky contestant picked the right key, and… watch the video below, no subtitles needed.

I probably don’t have to spell out the analogy at this point, but I will anyway. You had to figure out how to participate in the show in order to have a chance to win. I’ll spare the details, but this wasn’t trivial (be in the right place at the right time). You needed skill in order to get a key (a chance of a good exit). And ultimately you needed dumb luck to pick the right key. If you didn’t get lucky… well, you might have one or more two shots before the end of your graduation year. Feliz Startup!

Hacker News discussion here.


Being the First Employee of a Startup Could Be Right for You

In SiliconValleyStartupLand there are memes that some repeat as if they were absolute truths. One of them is “the first employee of a startup gets screwed.” Sure, and the black guy always dies first in movies. While we are at it, you cannot get pregnant the first time you have sex (yes, some people believe this).

The truth is that being the first employee of a startup could be shitty, or it could also be a great investment of your time and effort. When I say “great investment” I’m not talking about outlier outcomes such as that of Craig Silverstein, Google’s first employee. That’s just winning the lottery several times. For the sake of the argument I’ll stick with the more common case; a startup raises a seed round or even a series A. They hire a few good people, operate for a few years, and exit for a number between 0 (failure) and maybe 100M.

In the worst case scenario, no employees (including the founders) receive a payout at the end of the journey. In the 100M case, the founders are set for life. Employee #1 (let’s call him Joe) makes a nice chunk of change but not enough to “solve the money problem.” If during all that time Joe had been working as hard as the founders while making a shitty salary, he will feel like he got a raw deal. This raises an obvious question: why did Joe take a shitty salary, a comparatively small equity stake, and then proceed to work as hard as a founder? Answer: Joe did not know what he was doing. He sold himself short.

So, without counting on riding a Google (or even an AirBnB or a Heroku), when/how/why is it good to be employee #1? Let’s describe one very specific case:

Bob is relatively young. He just finished college, and he’s working as an engineer for some boring corporation far from the coasts. He’s smart and ambitious, and he’d like to start a company. Maybe he even has an idea, a prototype, a potential cofounder named Pedro. What he doesn’t have is a few months worth of Bay Area living expenses in the bank.

Bob would like to move to the Bay Area with Pedro, make connections, raise money. They’d be burning their savings quickly while learning the Silicon Valley game. It may work, but it would be much harder than for someone who’s already been here for a while. Of course there are ways to solve that problem. Being accepted by an incubator such as YCombinator is one of them. However, most people won’t be that lucky.

In this particular case, being employee #1 of a startup could be the way to go. Bob (and maybe Pedro) could move to Silicon Valley for free, and make a reasonable salary. By reasonable I mean: you can rent an apartment in a decent location, live well, and have disposable income. Any startup with funding should be able to pay you a competitive Bay Area salary. If a startup cannot do that for you, then they need to make you a cofounder.

Bob will also get some equity, which could be 5% of what a founder has if he’s lucky. He should negotiate the best deal he can get, but not place any expectations in the potential value of his stock. He may leave before his first cliff, get diluted in subsequent rounds of funding, and of course the startup could crash and burn after a winter or two. The important thing is what Bob should try to accomplish while at the startup, namely:

1) Learning what being a founder means. He will have a chance to see the struggles of the founders up close, perhaps a window into their decisions and thought processes. He can take notes, imagine what he’d do differently, see how things turn out.

2) Making connections. Bob is in Silicon Valley, so he can attend all sorts of events. He can go talk to customers. He may work with potential partners for the startup. If the startup is doing well, he has all the time in the world to meet people who could become potential cofounders. If it’s not, then he’s getting a better deal than the founders from a financial perspective.

Bob is getting paid to get a crash course in startup life. He’s not committed to the company like a founder, so he has the option to abandon ship if/when things look dire. Also, he gets a lottery ticket as a nice bonus. Put this way, one or two years as employee #1 could be the best option for Bob.

If you’re not in Bob’s situation, then you may not want to take that gig. Just don’t go around preaching absolute truths about how stupid it would be to be employee #1, or about how black guys always die first.

Discuss this post on Hacker News

My Criteria for Investing in Startups

Whether I like it or not I’m an angel investor, even though I prefer a different label. I don’t want to picture myself as a has-been who doesn’t have the fire to build another company. A lucky bastard who plays Fantasy Football with money that other people would use to educate their kids, buy a car, pay debt. While that may be true, I invest in startups because I need to feel useful. If I were a fulltime money manager, or a VC with a responsibility to LPs, I would have a duty to find investments with the best possible rate of return from a strictly financial standpoint. I would not want to be in that position. I invest in some startups because I genuinely enjoy it. I don’t plan to allocate more than 5% of my portfolio to startups. Losing it all would not be a catastrophe. On the other hand, I get an immediate psychological payoff from the investments I make. I thought about how to maximize the “personal return” I’d like to get from my investments, and I ended up with a few rules (somewhat flexible, subject to change). Here they are:

1) I strongly prefer investing in companies located in San Francisco (city proper, not the Bay Area).

Why? I live in San Francisco, and I enjoy driving on Bay Area freeways as much as having a root canal done by a drunken dentist who hates my family. I like to be able to take public transportation and have meetings with founders, sometimes spontaneously. I may find myself having an IM conversation with a founder, and think it would be best to just grab coffee and scribble on a whiteboard. One of my portfolio companies is about eight blocks from where I live, and another one is fifteen minutes away by Muni. If I believe those meetings could be helpful to the companies, that makes me feel useful. That’s a positive return to me. If I had to drive to Palo Alto or Mountain View, I’d have to subtract two hours of soul-destroying driving from that return.

It turns out that San Francisco is one of the best cities in the world for certain kinds of startups, and there are plenty of them to choose from. If I lived in Chicago, Berlin, or Madrid, I might not invest in startups at all.

2) I invest in companies that I believe are trying to improve the world.

Notice that this is a very subjective statement. Someone might think Facebook or Twitter are improving the world; I don’t. I avoid the entertainment industry, and anything where the revenue stream is based on advertising. I prefer investing in health, transportation, education, and (under some very specific circumstances) finance. Don’t get me wrong, I’m still trying to make money with my investments. This is not charity, and I’m not Bill Gates (although I’m a huge fan of his current work). There are tons of profitable businesses that I believe can improve the world: better and cheaper medical technology, cleaner / less risky transportation, applications to help people avoid financial suicide, etc. My vision of a better world may not coincide with yours. I may be wrong or misguided, who knows. What matters to me is believing that the potential contributions of these companies to the world outweigh their negative externalities.

3) Hey, why didn’t I start that company?

As an electrical engineer and software developer with decades of experience building internet infrastructure, there are certain areas where I believe I could contribute much more than the puny sums that I can invest. If I see a company that I would have wanted to start, I’d like a chance to contribute to its success. I’ve been around Silicon Valley for a while so I’m connected to many engineers and investors whom I deeply respect. Some lawyers, even. Thirty years after writing my first program in Basic I still code for fun, and I doubt I’ll ever stop. I like to analyze data, find relations between variables, extract insights that are sometimes far from obvious or even counterintuitive.

My skills make me useful to a relatively small subset of all startups out there. I recently attended a Demo Day for a well-known incubator. I had the impression that 95% of the startups who presented could not benefit from my investment any more than they would by taking money from a random stranger. For the others, I would be excited to be involved in some form. I’m not looking to become an employee of a company, and I like having the option (but not the obligation) to contribute work. When I make an investment that’s financially meaningful to me (probably more than to the startup), my brain automatically starts paying attention to its market, competitors, technology, introductions I could make, plugins or applications I would write, etc.

I wrote this post because I felt like it, but hopefully I did a decent job at selling myself as a potential investor. If your seed-stage company fits the criteria above, please contact me.

By the way, don’t follow me on Twitter 🙂

Guns and Preventable Suicides

Warning: what you are about to read is mostly speculation. Take with a grain of salt. Don’t trust my numbers blindly. Find your own data.

The suicide rate in the United States is twelve per one hundred thousand inhabitants. That means that every year, around 38,000 people kill themselves. About one hundred per day. Of course, not all suicides are equal. For example, some are the result of a carefully planned decision. Others are not so planned. Perhaps they are not spontaneous, but they are more preventable. I’m not going to claim expertise in the subject, most of what I know comes from conversations with my father; he was a psychiatrist who treated suicidal patients for decades.

Continuing the line of my recent posts, I decided to plot suicide rates vs. household gun ownership by US states:



The above chart surprised me. The correlation between the two variables is pretty high (0.6). What’s going on here? Here’s the first hypothesis that comes to my mind: suicidal people with easy access to guns are more likely to actually kill themselves.

I can imagine a scenario in which someone crosses an emotional threshold, grabs a gun from a drawer and kills himself (the majority are men). In a parallel world there is no gun in the drawer, this person would not be able to commit suicide so easily. Perhaps he would go out and drive his car into a wall, but this would give him more time to reconsider. The time elapsed between deciding to shoot oneself and being dead can be very short.

If you believe that these people should not die, then you could use this hypothesis to make the case for gun control. The majority of US gun deaths are in fact suicides, and firearm suicide is by far the most common method, so this is not an insignificant problem.

Of course there are other possible interpretations of the data. Maybe people in certain states are more predisposed to suicide AND to own guns? Occam’s razor would say no, but it’s worth a look. On one extreme we see states like Montana, Wyoming and Alaska. On the other, New York, New Jersey, Maryland. There are certainly demographic differences across these states: population density, income levels, etc. Could Montanans be three times more suicidal than New Yorkers due to those factors? I don’t know, and I can’t rule it out.

It’s also interesting to observe what happens across countries. I scraped Wikipedia’s List of countries by suicide rate and plotted that against Number of guns per capita by country:


In this case, the correlation is almost nonexistent (0.08). Of course, no other country comes close to the US in guns per capita so this is not a complete surprise. Also, the data is different: this is total guns per capita, not percentage of households with guns.

At this point, all I have is the question: would it be possible to prevent thousands of deaths every year by making guns more inaccessible to suicidal individuals? Would it be worth the effort? I wish I could offer answers to these questions, but I’m just one guy with some free time. All my data and “insights” come from Google, Wikipedia, and R. What data is out there that I could be missing?

Final note: this is a very sensitive topic. If you are going to comment on this post, please be reasonable and rational.

More Charts: Murders, GDP, Inequality

I decided to plot country murder rates against gross domestic product per capita, and also against Gini coefficients. I scraped the data from these sources (.csv files at the end of the post if you’d like to use them).

This is what Murders vs GDP per capita look like. The Pearson correlation in the chart (r) is -0.32.



Murder rates vs. Gini indexes. r = 0.5


This the same chart as above, restricted to countries in the Organization for Economic Co-operation and Development (OECD):


And the same chart, zoomed in (Mexico and Chile are outside):


Finally, the data. One caveat: if you look at the World Bank spreadsheet, you will see that not all countries have current Gini statistics. I used the most recent one for each country. These charts might look different if all countries were current.

Now, do I dare draw any conclusions from these charts? We see the correlations, but what do they mean? I see many poor countries with high murder rates and high inequality. We could speculate that the problem for those countries is the failure of rule of law. It’s even more tempting to guess what’s happening in the last chart. Developed countries like the US make it easier for individuals to become very wealthy, and don’t offer a comparatively good safety net for the relatively poor. This incentivizes people to work hard, start businesses, avoid poverty. Obviously not everyone will succeed. Perhaps the price to pay for this model is a higher murder rate.

Of course it would not be serious to arrive at that conclusion from these hasty charts, but I can pose the question. Feel free to play with the data, and to suggest other variables to analyze.

Mass Shootings, Political Correctness, and Magical Thinking

Speaking in Newtown, CT yesterday, President Obama said:

We can’t tolerate this anymore.  These tragedies must end.  And to end them, we must change.  We will be told that the causes of such violence are complex, and that is true.  […]  Surely, we can do better than this.  If there is even one step we can take to save another child, or another parent, or another town, from the grief […] then surely we have an obligation to try.

It was a comforting speech for the victims of a tragedy, so it would be unfair to criticize the arguments from the point of view of logic. However, it is worth analyzing the issue of mass shootings as a problem that might be addressable with public policy.

I would start by measuring the magnitude of mass shootings as a problem. How does it compare to other issues such as preventable diseases, regular crime, terrorism? I searched for data, and found out that in the past 30 years, 543 people have been killed in 70 mass shootings. That’s an average of 18 deaths per year. For comparison, three times as many die from lightning strikes.

The New Republic article linked in the previous paragraph states “I can’t say exactly why mass shootings have become such a menace over the past few years, and especially in 2012.” Given the low numbers, it’s likely that it is just a random fluctuation without statistical significance.

To put things in perspective again, half a million Americans die every year from tobacco use. Two hundred thousand die from medical errors. Those numbers are large enough that it’s possible to track changes with statistical significance, and evaluate the effect of public policy. There must be a fair amount of low-hanging fruit. For example, it’s feasible that a 100% tax on the price of cigarettes would save thousands of lives ever year. Why is this not attempted? Probably because the special interest group that controls tobacco sales is powerful enough to stop it.

For mass killings, the numbers are already so low that the logical question would be: is it worth doing anything to try to reduce even more the chance of mass killings? What could be the undesired side effects of implementing policies to that effect? For example, let’s say that someone came up with a vaccine that guaranteed that a child who received it would never be a mass killer. However, one child in 100,000 dies from an adverse reaction to the vaccine. Clearly the vaccine itself would cause more deaths than mass killings, so it’s a net negative if we are trying to minimize unnecessary deaths.

At this point, I have to disagree with Barack Obama. I don’t think we have the obligation to try to reduce the incidence of mass killings because there are high chances that an intervention would be iatrogenic: the cure be worse than the disease. This is not a politically correct thing to say, so you won’t hear politicians say it. That doesn’t mean our legislators will do anything, of course. Mass killings are as inevitable as lightning deaths, and they will continue to be news precisely because they are infrequent and horrible.

Who knows, maybe doing nothing is the right thing. There are medical procedures that are not recommended anymore because they have potential complications, and they offer no measurable benefits when compared with inaction.

What makes matters more complicated is that mass shootings bring up the issue of gun ownership in the US. If this killing had been a bombing nobody would be talking about gun control. However, many people who normally don’t think about gun crime are emotionally moved by mass shootings. From a logical viewpoint, we should be more concerned with gun crime in general. If gun crime is a significant problem, then gun control could be a solution to that problem. Surely gun control would have side effects, but it’s likely that those side effects would not offset the gains.

So, is gun crime a problem? In the US there are about 3 gun homicides for every 100,000 inhabitants every year. That means about ten thousand people are shot to death in the country. For the average American, the odds of being murdered with a gun are 1,000 times higher than those of dying in a mass shooting. His/her odds of dying of cancer are “only” 60 times higher than those of being murdered with a gun, so the problem is not insignificant.

Let’s say that we believe that the cost of implementing gun control is less than the benefits. Perhaps we can save four thousand lives every year if we make it harder for criminals to obtain guns. More importantly, we can do it without taking any resources away from the fight against the main causes of death: cancer, heart disease, and accidents. How would we go about it?

The US is a very unique place when it comes to guns. As of 2009 there were 310 million non-military firearms in the country. It is possible to make it illegal to produce and buy new ones, but what do we do with the existing ones? What kinds of imbalances would be created if those who would only use guns to protect their property could not own them? What if most potential murderers kept their guns, and all the guns turned in (say, for cash or tax breaks) were the ones less likely to murder anyone? What kind of black markets might arise for guns and bullets?

I’m not even going to try to answer those questions, because they are extremely complex. I personally hate guns. I have never owned or even fired one. I wish they didn’t exist, but they do. However, believing that gun control would immediately save lives is magical thinking. It might work in the long run if implemented correctly for the US, but it when it comes to reducing murders it would not be a silver bullet (pun intended).

The other issue that many bring up when mass killings happen is mental illness. There is little question that those responsible for mass killings fit most definitions of “mentally ill.” However, they are a minuscule minority. At the same time, mental illness is a horrible condition that causes an enormous amount of suffering. It affects millions, and there is no question that it would be a good idea to address it through public policy. This might have the bonus of preventing the odd massacre in which the potential perpetrator could have been under treatment for a condition such as paranoid schizophrenia. However, not all sufferers of this condition would seek treatment. Norway has one of the best healthcare systems in the world, and that didn’t stop Anders Behring from killing 77 people. Some conditions are asymptomatic for a long time, and manifest themselves too quickly. “He seemed like such a nice, quiet guy. I don’t know why he flipped out.”

If there is one point I’d like to make with this long rant is that public policy should not be dictated by emotions. Minimizing unnecessary deaths and appeasing public opinion are different things. Most human beings do not understand concepts such as statistics or iatrogenics, so they will clamor for immediate feel-good action. I wish I lived in a world where people (or at least leaders) would always analyze issues rationally. Where they would act to maximize public good instead of their chances of being re-elected. All I can do is ask my readers to try to understand all sides of a delicate issue before forming an opinion, like I attempted to do in this post.

Discuss on Hacker News (please be civil!)