Software Development Estimates, Where Do I Start?

For some reason many people discuss the problem of estimating software development timeframes without properly understanding the issue. There is a famous answer on Quora that exemplifies this. Lots of people like that story, even though it’s inaccurate and misguided. “Software development” is such a huge endeavor that it doesn’t even make sense to talk about estimates without an understanding of the kinds of problems software can solve. To put this in context, let’s forget software for a second and look at a few tangible problems of different magnitudes.

  • You are a medical researcher. A new disease makes headlines. It’s a virus. It seems to be spreading through sexual contact. How long is it going to take you to find a cure?
  • It’s 1907, and you’ve just built the first airplane. The government wants you to build a spaceship to fly to the moon. How long will it take?
  • You are in charge of a construction company that has built hundreds of buildings in your metropolitan area. I want you to build me a twenty-story apartment complex very similar to the one you just finished on the other side of town. When will it be done?
  • You own a chair factory that produces 1000 chairs per month. I need 3500 chairs. Can you make them in a month?

The four questions above are radically different in nature. The first two involve significant unknowns, and require scientific or technological breakthroughs. The other two, not so much. Meta-question: does software development look like the first or the second kind? Another meta-question: what kind of software development are we talking about?

Let’s focus on the construction company. People have been constructing buildings for centuries. There’s relatively little variation in the effort and costs of putting up vanilla high-rises (we’re not talking about Burj Khalifa). Of course there is uncertainty: economic conditions could change, suppliers could go out of business. The new mayor could have a personal vendetta against your company because your big brother bullied him in school. All those things have happened before, perhaps in combination. Let’s say you can give me an estimate of 15 to 24 months with 98% confidence based on past data. Sounds good to me.

If buildings were software you could break out a template, customize it a bit, install it on my plot of land in the cloud, the end. There are companies doing this for software, for example Bitnami (disclosure: I’m an investor). The process is so quick that you don’t even need to ask for a time estimate. You just see it happen in real time.

Let’s imagine that it were impossible to clone software at almost zero cost, like it is for physical things. Most software developers would be like monks copying manuscripts. If you have been handwriting pages for long enough, you can confidently tell me that it will take you at least 15 years to produce a high-quality copy of the entire Bible (it can be done 4x faster today, not sure about the quality though). You could get sick, or suffer interruptions. However, the number of absolute hours you’ll need is well known. There is a type of software development that works like this: porting old applications to a new language (say COBOL to Java back in the day).

Of course, the more rewarding problems in software development look nothing like this. I enjoy trying to solve problems that nobody has solved before. The malleability of software makes it easy to explore an open problem. Some problems are deceiving; at first they may look like building a house, and as you discover unknowns they sometimes mutate to resemble a quest for an AIDS vaccine. If a problem is solvable with software, it may take weeks or months to come up with an imperfect solution. It will probably take years to build one that’s scalable and robust. The meta-problem is that some problems cannot be solved with software, or at least not yet, or not by me / my team / my company. I might give you an estimate that would look like:

  • less than a month: 30% chance
  • less than a year: 40% chance
  • never: 30% chance

Another kind of software development somewhere in the middle, and it may be the one that generates the most software jobs. Usually an organization wants a solution to a problem that has already been solved by others (e.g. building a cluster manager for a social network graph). Even though you don’t have access to the design or the code, you have an idea of what the solutions look like. You don’t know what key issues they ran into, how good the teams were, how lucky they got. Still, you know the problem can be solved by companies that look like yours for a reasonable cost. There is still quite a bit of uncertainty:  you can estimate small tasks reasonably well, but you cannot predict which “week-long” task might expand to several months (e.g. it turns out that no open source tool solves X in a way that works for us, we’ll have to write our own).

The gist of why estimates are hard: every new piece of software is a machine that has never been built before. The process of describing how the machine works is the same as building the machine. The more your machine looks like existing ones, the easier it is to estimate its difficulty. Of course it won’t be exactly the same machine; that can happen with a chair but not with software. On the other hand, you may want to boldly build what no one has built before. In that case, you’ll most likely adjust your scope so that you can build something that makes sense for the timeframes you work with. The solution to the original problem might take iterations over generations. Not necessarily generations of humans, perhaps generations of product versions, teams or even companies. You may set out to put a man on the moon, and your contribution would be the first airplane.

Discuss on Hacker News

11 Replies to “Software Development Estimates, Where Do I Start?”

  1. Very interesting. I enjoyed the comparisons very much.
    We work (at Caesar ystems) a lot on the last case.
    We have a very rich product, a piece of simulation software with lots of tools.
    We follow sort of an agile methodology, and for new features of the software we have a User Story divided into tasks.
    We have worked like this for several years now and, to address the influence of unknown-unknown on our estimates, we use a probability distribution on the number of tasks initially defined, with parameters adjusted by previous experiences.
    So, we don’t give an estimate like “we will deliver this in three months”, but “we have 88% chance of delivering between 10th and 11th week”.

  2. I work as an accountant, and I’ve noticed that forecasting how long it will take to do something shares a lot in common with forecasting how much something will cost. The greater the uncertainty, the greater the underestimate that we make. Humans have a consistent bias towards underestimating time and expenses to do something.

    I think it is because we are really bad at estimating the unknowable. When there is a great deal of uncertainty, there is a lot that we can’t anticipate. Rather than build a cost or time estimate for the unknowable into our calculation, we just tend to ignore it. We simply estimate what we anticipate, which leads to a huge underestimate.

    1. Yes. That’s because most people give their best estimate of how long a task will take, even when they’ve been asked to give an “on average” estimate. Read “The Black Art of Software Estimation” by Steve McConnell for more details on this phenomenon. That’s why PM’s tend to double estimates given by their engineers.

  3. I wish I could provide estimates in the form of how long and how probable that duration is. Unfortunately all I am allowed to submit is how long. And that time span is always under heavy scrutiny.

  4. I really don’t see what this article has added. It sums up by saying that it’s easier to estimate things that have existing parallels (i.e., have been built before or something similar has been built before), to which I say, Duh! This is Estimating 101.

  5. Nice story on Quora about the hike. It demonstrates that the right people have to be asked about any estimate. This is called reference class forecasting, which would be the best. Subject Matter Expert is acceptable as well. Thanks for the link, we’ll use this in our domain.

    Separating the need for an estimate from the skill and experience of the estimator is critical to the discussion of estimating.

    I work in the space flight business. The cost of a Low Earth Orbit (LOE) machine can be within 10% by knowing the mass of the spacecraft independent of the sensor platform. Von Barun “estimated” the mass of the lunar lander that could return to the orbiter on the back of a napkin because he was a “subject matter expert.”

    Colleagues can estimate the cost and schedule to add features to a health insurance claims processing ERP system because they “know” the domain, context, and have reference classes.

    Setting out on a walk from SFO to LA without some type of reference class or consulting an SME is a good analogy of how estimating goes bad. But it’s not the process that goes bad it’s the people doing the process.

    The conjecture that “every new piece of software is a machine that has never been built before.” means you have no – ZERO – basis of reference. You have no – ZERO – understanding of the underlying elements of the work, or you have no – ZERO – understanding of what “done” looks like in some unit of measure meaningful to the decision makers.

    In this case, yes “estimating is hard.”

  6. Estimating truly original or copycat software is not about giving final numbers that stick. If that’s what your client wants to hear, tell them it ain’t happening.

    Estimating is about proving an understanding of the concept and highlighting unknowns or even unknown unknowns. It’s about gleaning some insights to see if you have what it takes to try. The real estimates come when you’re well into designing and building, and even then it might be a guessing game.

Leave a Reply

Your email address will not be published. Required fields are marked *