(Mis)estimation – why estimates tend to be wrong

Probably every software developer had to estimate the effort needed to implement some wanted software. And the estimate was likely rather wrong. I collected reasons why this happened to me over many years. And I concluded that the estimation approach is fundamentally wrong. Here are my reasons:

This blog post is based on one of my presentations and workshops about estimation in software development. So, it will be a long post. 😊

Thought experiment

Let’s start with a thought experiment.

We aim to build a small application with ten features (A to J).
We are fortunate because we have two things. We have Dave. Dave is a perfectly average developer. Whenever you take a group of developers, Dave is precisely as good as the average of the selected group. Remember, it’s a thought experiment; magic is allowed.
And we have Oria – the all-knowing oracle. Oria tells us how long Dave needs to build each feature. Dave can build A in 7 days, B in 1 day, etc.

My question to you is, how long would a team of five developers have to build the application?
Remember that Dave is the perfect average of these five developers and that Oria is always correct.

Take your time and come up with a number of days the team needs to complete the application.

No, really, think about it. What is your answer? I have time to wait. 😁

The most typical answers from all my presentations and workshops are either 16 days (= 80 days / 5) or “no clue”. And only the second one is reasonable.

Let me show you ten reasons why.

Reason 1: Diseconomy of Scale

When a team builds several features into a single application, we cannot simply sum up the estimates for the individual features. For example, when we build features A (7 days) and B (1 day), the resulting effort will likely be a bit higher than 8.

The reason is integration. Integrating many parts into a single system is an additional effort. Sometimes, we get lucky, and we can benefit from work done earlier. But sadly, most of the time, combining parts introduces problems that need to be solved:

  • The parts don’t fit together, and an adapter is required.
  • One part’s concept contradicts the other’s, and one part must be changed.
  • Integration gets a bit more complicated with every feature because the team members can’t keep the whole system in their heads anymore.

Concepts exist to fight these growing integration costs, like vertical slice architectures, but overall, integration still costs effort.

Let’s consider different integration costs for our thought experiment:

The above diagram shows the effort needed when a 1%, 5% and 10% integration cost is involved. Integration cost of 1% = take an additional 1% of the already spent effort. With 1%, we would need 3.7 days longer than the simple sum of the individual estimates. With 5%, it would take us 20 days longer. And with 10%, we would need an additional 46 days.

In reality, we hopefully don’t have 5% or more integration costs. Although, for some legacy systems, this might be true. But in reality, we typically have many more features than just ten.

Let’s assume we want to build a system with 100 features, and every feature individually would take one day to build:

The blue line shows the simple sum – 100 features take 100 days.
The orange line shows the effort needed if there is a 1% additional integration cost per feature. We end up needing around 170 days. A delta of 70 days.
The grey line shows the same for an integration cost of 2%. It would take over 300 days to complete the system. A delta of over 200 days. More than three times as long as estimated!

I don’t know how significant the additional integration costs of your system are. That depends heavily on the kind of system you build, the chosen architecture and design, and the team’s skill level. But be assured that they cannot be neglected.

This diseconomy of scale – the more you want, the more expensive it gets – is one of the major reasons estimates are typically wrong – easily wrong by a factor of two or even more.

Reason 2: team synchroniSation & knowledge management

We typically work as a team to build an application. When there is a team, the members need to synchronise themselves by communicating, and they need to manage their knowledge to function as a team.

The bigger the team gets, the more communication is needed. In my experience, a team bigger than eight people can’t function properly anymore because the need for communication brings everyone to a halt. Team synchronisation needs time that is normally not contained in the estimates. Relative size estimation techniques can help, but only if the team size doesn’t change.

The same holds for knowledge management. The bigger the team gets, or when adding additional teams, the efficiency of knowledge management sinks. A single person or a small team can keep much information in their heads. That’s great because when the information is in the heads, the developers can act on it – or do you consult your design guidelines every day before coding to comply with them?

All of this is no new information. We can find it in the book The Mythical Man-Month, Essays on Software Engineering by Fred Brooks from 1975:

Only when the tasks are perfectly partitionable can more people work efficiently on them, and there is a linear connection between the time needed and people working on the tasks: double the people, half the time required.
When the tasks are unpartitionable, they take the same time regardless of the number of people working on them. Giving birth to a child takes roughly nine months, regardless of how many mothers you have!
In typical software development, we have a set of tasks that have complex interrelationships. Some tasks can be worked on in parallel, some depend on each other, and some influence others. There is an optimum somewhere around 5 to 7 people.

So, in our thought experiment, our team’s communication, synchronisation, and knowledge management need additional time upon the simple sum of feature estimates.

Reason 3: Changing Technology

I once had a conversation with a potential customer who asked, “How much does it cost? And please include the costs to keep the application up-to-date over the next ten years.”

A question that cannot be answered, at least not without lying.

The reason is that our application is built upon many existing libraries, frameworks, and platforms.

Any of these can change anytime. We once got an email from Microsoft Azure that they discontinued the Scheduler service and that we had to switch to Logic Apps. Not much choice here. We had to switch, which, of course, resulted in additional effort.

Depending on the change in the library, framework, or platform, the additional costs differ greatly:

If there is only new stuff, we can ignore that, and the update probably takes a couple of minutes.

If we have to make some minor adjustments, we can get them done, hopefully, in a couple of hours, retest and be happy.

If we have to replace or reengineer parts of our system, that quickly takes days to weeks.

In the worst case, a technology we depend upon is discontinued, and we need to replace it and probably a lot of code that depends on it. This will take months.

Since we cannot foresee which or how libraries, frameworks, or platforms will change, we cannot estimate the effort needed for updates.
Staying on the old versions is usually only moving the problem further into the future or getting some nasty security problems.

Reason 4: Emerging Quality Attributes

In today’s distributed, cloud-based, UI-heavy applications, predicting specific quality attributes is impossible. One can guess how performant, usable, and so on a system will be, but since we depend on so many services and platforms that we can’t fully control, many quality attributes will be emergent. With emergent, I meant that these attributes can not be thoroughly planned for. You will see how they are once you build the system and put real-world load on them.

Here are some examples:

If you listen to your users’ feedback – yes, you should – you’ll never know whether you’ll need one or several cycles to get your application into a state your users like. Every cycle improves the quality but costs you additional effort.

Performance and reliability depend heavily on the products, services, and platforms you use. Only measuring performance and reliability in the live system provides good enough metrics. Even redeploying the whole system can result in different performance characteristics depending on which servers your application lands in the cloud centre. Measure, improve if necessary, and repeat.

Compatibility in an ever-changing world is hard. APIs of the services you talk to may change – announced or by accident. The connection to other services may change their characteristics regarding throughput, latency or availability. Monitor these connections and adapt if necessary.

There are frequently new security vulnerabilities. Adapt!

Even things under your control change: the data in the live system grows hour by hour, and the concepts in your code evolve over time. Keeping your application maintainable is not a negligible effort and is hard to predict.

Finally, the ecosystems our applications run in are ever-changing. We must keep up, or our users won’t be happy anymore. But again, we can’t foresee how the ecosystem will evolve over the following months or years.

All these quality attributes have in common that we need to build something and then get feedback through users, monitoring or measurement. And we can’t predict how many iterations it will take until the quality meets our expectations and needs.

Reason 5: re-work, aka fixing bugs

When we write code, we will introduce defects or bugs, no matter how hard we try not to (with good engineering practices, including testing of all kinds).

When a bug is introduced, it lingers around until someone stumbles upon it. We are lucky if it is a test and not so lucky if it is a user. Once we know the reason for the bug, we can fix it – which needs time, too. So, there is a time window when the bug is undetected and not yet fixed. If we build functionality upon the code containing the bug, this functionality is at risk of being affected by the bug or its fix. Maybe we need to change the design because of the bug fix, or change an algorithm because data takes a different form and so on.

Luckily, this happens rarely, but it happens. When it occurs, it means additional effort because the already written code has to be adapted or, in the worst case, rewritten.

Even if no other code is affected by the bug or its fix, the developers need to task-switch from building some new feature to fixing the bug. When we have a high bug rate, task-switching lowers efficiency significantly, resulting in slower progress and more time needed to complete the functionality we estimated.

And how long does it take to fix a bug?

Obviously, this depends on the kind of bug.
Often, bugs can be fixed within minutes.
Frequently, fixing a bug needs some analysis to fix its root cause, but the fix is straightforward. So we may require a couple of hours.
Then there are the hard-to-reproduce bugs, on which we may spend several days to lock the root cause down.
Finally, and hopefully rare, bugs are caused by a fundamental flaw in the system, requiring reengineering of a part of the system. These bugs quickly lead to weeks of additional effort.

Bugs are often a significant source of additional effort because of the time needed for analysis and fixing and the lower efficiency of task-switching.

Reason 6: Blockers

We can’t make any progress from time to time because we are blocked. Blockers come in many forms.

We can’t continue because information is missing. Or team members needed for the current task are absent. Infrastructure does not always work as needed. It needs updates, restarts, or re-configuration. Or some of the many other reasons why we can’t start with a task. Finally, team building takes time as well.

In my experience, blockers happen all the time for many reasons. And they are not predictable; otherwise, we would have prevented them.

Reason 7: Unknown Unknowns

There will be things that we couldn’t anticipate, so-called unknown unknowns. Sometimes, we know that we don’t know something. We may be able to estimate how long it will take to get to know this thing. But in software development, there are frequently things we couldn’t think of earlier; they get us by surprise.

A very common situation is when a real user sits in front of your application for the very first time. Things you meant were easy are unsolvable for the user. The user expects things you thought were unnecessary, and so on.

All these unknown unknowns typically lead to additional effort.

Reason 8: The Need for Re-Architecting

I have seen it over and over again that teams run into a dead end and need to turn around.

For example, an architectural decision that sounded feasible some time ago turns out to be a misfit. An expected quality attribute can not be met – maybe the system is too slow, or a needed feature cannot be added.

Over my years of architectural work within my teams and consulting other teams, I have seen these typical topics that often lead to having to re-architect parts of the system:

Translations
Translations are often treated as a simple lookup, and there is a single relevant language for a user. And this gets you very far in most products.
But you may need multi-part translations (e.g. translations with placeholders for values). Or that you have to show an address rendered with French language settings inside a UI that is in German – a German-speaking user looking at the address of a French citizen.
Switching the translation mechanism and supporting multiple languages simultaneously may trigger a thorough redesign of the translation infrastructure.

Communication
If you have, for example, to switch from asynchronous to synchronous communication between two parts of the system due to consistency needs.

Reporting
You probably start with generating reports from your only database where all the data lies. At some point, you may need to switch to a data warehouse to support all the use cases required at an acceptable speed.

Archiving data
After a couple of years in production, much data has accrued in your system. To get better performance, you want to archive parts of the data. If not planned for early, splitting data into live and archived data is hard – especially when big domain models are involved with referential integrity. Typically, the data model has to be changed, and the data needs to be migrated – a lot of work.

Time and Time Zones
When you started your software product, you probably didn’t think of time and time zones too much (unless you build a time tracking software, as we do 😊).
But handling time gets difficult once you need to support time zones or time changes twice a year. You probably have to reengineer many parts in your system that interact with time and time differences.
Or even worse, you need to support bi-temporal data (data with two time components, like paying a bill now with a due date at the end of the month).

Scaling
Most systems can scale using vertical scaling for a long time – use a bigger machine. Once you have to switch to horizontal scaling, you have to face many new problems (see fallacies of distributed computing). Learning and handling all these new problems takes a lot of time.

Historic data
Imagine you have all the data of your application saved in a state-based CRUD (create, read, update, delete) style. Now, you need to implement a new feature that allows you to show how data has changed over time (history). Most of the time, depending on your current architecture and design, this is a major re-architecting effort.

Exception Handling
Another time when re-architecting is typically unavoidable is when you switch from a single relational database to several databases, especially when different kinds of databases are involved. Then, you usually have to switch from transaction-based mechanisms to ensure consistency to compensation-based mechanisms. Instead of rolling back a single transaction, we need to compensate for the actions taken earlier. Compensation is a much more flexible and powerful approach, but switching from transaction-based mechanisms requires re-architecting the fundamentals of how you deal with consistency.

There are certainly more triggers for re-architecting, but I hope you see how only one of the above can ruin your estimate tremendously.

Reason 9: Variability and Overutilisation

Variability describes how much things change over time. When developing software, many things show variability:

Team capacity is variable. Team members go on vacation and come back again. Team members leave and join the team.
The product backlog is updated frequently – at least, it should – based on new insights.
Technology changes because of updates and because we introduce new technology to benefit from new possibilities.
And finally, variability changes, too. Stable periods are followed by more turbulent phases that lead to stable times and so on.

Variability is always there unless you have a fixed team, without vacation, never change the product backlog, and stay forever on the same old technology.

Variability limits efficiency. Higher variability leads to less efficiency.
A good book describing this effect is This is Lean.

The above diagram shows the negative effect of higher variability on efficiency with the minus sign.
When efficiency sinks, effort grows (negative effect). Also, utilisation grows. Therefore, we take longer to finish a task and are more utilised.

That is okay as long as utilisation stays in a healthy range. But once we hit a certain utilisation threshold, we switch to thrashing mode. We don’t have enough slack time to deal with variability. We are overutilised. An example is when you need a meeting with your superior on an urgent matter, but the earliest free slot in the calendar is two weeks away! There is no time to react to urgent matters.

Thrashing has a positive effect on (over)utilisation and a negative impact on efficiency.

Finally, effort rises and rises and rises. This is one reason why late projects tend to get later and later.

We cannot predict variability in most cases, so we can’t predict efficiency, so we can’t predict effort precisely.

Reason 10: Plan vs Reality

Let’s assume we build the application from our thought experiment. We probably didn’t deliver the features we initially thought would be needed. We built a better-matching product because we listened to user feedback throughout development. Our customers and users are happy.

If so, what we delivered may look something like shown in the diagram below.

We delivered feature A as initially planned. It was, after all, our very first feature. We could predict that well.
But then we quickly realised that not all the things we thought from feature B were needed. Users were happy with three-quarters of it but needed a little addition (Z).
When we showed feature C, users immediately gave us a better alternative, Y. So we built feature Y after we had built C already.
Feature D was lacking some additional use cases, so we added them.
Feature E was good enough for our users after we built half of it.
Then, we found out in conversations with potential customers that feature F is not needed at all.
G was a match, woohoo!
We delivered H as planned but needed three approaches to find a user-friendly solution.
The feature I showed to be technically not feasible.
Finally, we only had to deliver half of feature J, but with some minor additions.

That happened in every project or product development effort I was part of in over 25 years of building software. What was finally delivered and useful was different from what was planned. That’s also why agile software development was invented in the first place: Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.

So, how useful is estimating features A to J when we know we will probably deliver something quite different?

The common Theme: Unpredictability

All the above ten reasons have the inherent unpredictability in software development in common. When we build software to the needs of actual users, users giving feedback, and on an ever-evolving technology stack, we will never reach predictability. The system of customers, users, infrastructure, services, 3rd party products and our team builds a complex system.

Most complex systems share a couple of attributes. I’m simplifying here; bear with me!

Interactions are more important than the attributes of the elements.
We don’t get double the output when we put double the input. It could be way more or less, or even zero. That is called non-linearity.
Complex systems often show emergence. Said simplified: the whole is bigger than the sum. For example, I can walk because there is an emergent effect from my legs and hip. On their own, they can’t walk.
The elements adapt based on feedback through interactions with the other elements.
Finally, self-organisation happens through local interactions. The system shows patterns and behaviours that can’t be explained by the elements alone.

A nice intro to complexity theory is here.

The problem we humans have with complex systems is that they are counter-intuitive. In a complex system, there is no repeatability. Just because something worked once does not imply that it will work again. And there is no predictability. We might be able to find the cause of an effect in hindsight, but we cannot predict an effect on an action. We can only try some reasonable action, observe the effect and adapt for our next step.

First, build a hypothesis. A hypothesis is not just a wild guess. A hypothesis always needs some supporting facts. “Based on <facts>, we anticipate that <action> will lead to <outcome>”.
Then, we try to verify the hypothesis or to falsify it, which usually is easier.
Finally, we adapt our hypothesis based on the new knowledge we learned from this experiment and repeat it.

Bonus Reason: Missing Flow

The ten reasons above are all caused by unpredictability due to software development being a complex system. But there is one additional reason that lets estimates explode in many companies, which has nothing to do with complexity.

It is missing flow. Work on a task gets frequently interrupted. There is a handover to another person, work can’t continue and has to stop and later be restarted, or there is a lot of task switching going on. All these hinder flow and, therefore, efficiency. You simply don’t get as much done per day as you thought.

When restarting or task-switching, we must build up the context again, losing time while doing so.

Why do you need estimates anyway?

So, I told you eleven reasons why your estimates are probably wrong. And why you should not trust the numbers. The most important question is, why do you even estimate? Who needs the estimate, and what for?

Typical answers are that we need a date when all of this is done, or we need to know how many developers we are going to need to get the product finished by a given date.

Sadly, both can’t be answered reasonably well if we build a new software product. Remember? Complexity.

Often, this need comes from thinking in projects instead of in products. When will this project be finished? We would never ask when this product is finished. A product is continuously improved; there will be version after version of an improved product. So, to get rid of estimates, you probably have to get rid of project thinking first. 😬

It is especially difficult if project leads have a bonus depending on meeting project deadlines.

Estimates are always wrong, but sometimes helpful.

Still, even when we think of our software as a product with a product lifecycle, we need to guess whether some endeavour (product version) is reasonable or not.
Can we build an MS Word clone with 8 people in 1 year? Certainly not.
What can we build with 8 people in 1 year? Probably a simple word editor with the most important functionalities.

You should think about the smallest product that could be sold. How many developers could we pay for the next year? And whether you want to take that business risk. The important part is that we should not talk about how long it takes to implement some list of features, but whether we see a working business case to earn money with a new product and how much money we can invest into this bet. You probably should talk to your C-Level or product managers and not your project leads about this.

Working without Estimation Arithmetics

When we prioritise the next tasks* to do, we bet on the highest value task with reasonable effort.

* I use the term task for anything to be built: a feature, a functionality, a user story, a use case, …

A task is the smallest thing that still provides value. If possible, we split tasks into smaller tasks that provide value on their own. This makes prioritisation easier because most tasks are rather small.

For a given task, we talk about the anticipated value. What would we gain from having this task done? Then, we make a quick wild guess on the anticipated cost. Could we build that in hours, days, weeks or months? If we think the task will take months, we either try to split it up if possible, or we really need to be sure that the value is immense. Otherwise, we simply do the task with the highest anticipated value first. The discussion about value is far more important to us than the quick estimation of effort.

Summary

The diagram below summarises the eleven reasons I gave you why planning with estimates does not work for modern software development – when we want to react to user feedback and keep technologies up-to-date.

My advice to you is to get rid of (most) estimates. They cause more harm than good.

How you can get rid of most estimates is highly context-specific. Start with the reasons why your organisation “needs” estimates. Make sure to dig deep! Then, find a way to achieve the same without estimates.

Good luck!

About the author

Urs Enzler

Add comment

By Urs Enzler

Recent Posts