#202: Estimation - Quantitative approaches

This episode is part of a wider mini-series looking at Estimation in Software Development.

In the last couple of episodes, I've looked at a number of methods that fall under the Qualitative approach to software estimation.

Qualitative estimation is predominantly based on expert judgment, some think based on subjective thought process.

In this week's episode, I want to move on to discuss some Quantitative estimation approaches.

While Qualitative estimation is predominantly based on expert judgement, Quantitative is based on something we can count or calculate, a use of statistical analysis based on historical data.

In this episode, I specifically want to discuss two quantitative techniques - Monte Carlo simulations and Statistical PERT (or SPERT for short).

Or listen at:

Published: Wed, 12 Feb 2025 01:00:00 GMT

Links

Transcript

Hello and welcome back to the Better ROI from Software Development podcast.

This episode is part of a wider mini-series looking at estimation in software development. I started the mini-series in episode 189 by providing the following guidelines:

  1. Don't invest in estimates unless there are a clear demonstrable value in having them.
  2. Agree what a valuable estimate looks like. This will likely be a desirable level of accuracy and precision for an estimate.
  3. Provide the team with training and time to develop their estimation skills.
  4. Collect data on points 1 to 3 and regularly review if you have the correct balance

Subsequent episodes take a deeper dive into specific aspects for estimation in software development, and while long-term listeners may find the amount of repetition across the series, I want each episode to be understandable in its own right, as much as practical to be self-contained advice.

In the last couple of episodes, I've looked at a number of methods that fall under the Qualitative approach to software estimation.

Qualitative estimation is predominantly based on expert judgment, some think based on subjective thought process.

In this week's episode, I want to move on to discuss some Quantitative estimation approaches.

While Qualitative estimation is predominantly based on expert judgement, Quantitative is based on something we can count or calculate, a use of statistical analysis based on historical data.

In this episode, I specifically want to discuss two quantitative techniques - Monte Carlo simulations and Statistical PERT (or SPERT for short).

But, before I talk about each, let's start with some commonalities.

At a high level, both take data and produce an estimate, or a range of estimates from it.

Take for example, we have historical data for the past two years of our development team. These approaches can be used with that data to produce estimate for future work.

Both techniques anticipate that by taking the Qualitative subjective element out of the estimation process and being data-based, it will remove the bias that can creep in from subjective elements and potentially reduce the burden on the delivery team to produce the estimates.

Personally, I don't believe that any of the Quantitative approaches deliver on this fully hands-off dream. There are still elements of Qualitative activities involved, thus room for bias to creep in. However, more on this once I've gone through the two techniques in a bit more detail.

Let's summarise the Monte Carlo simulation.

Monte Carlo simulations help estimate software development timelines and costs by leveraging historical delivery data. This method involves running thousands of simulations using difficult possible outcomes based on past project data to predict a range of future scenarios.

Imagine you're planning a road trip and have data on travel times from past trips. Instead of guessing the travel time, you consider various factors like traffic and weather, running numerous scenarios to see a range of possible travel times.

In a similar way, in theory, Monte Carlo simulations provide a range of possible projected completion dates and costs, helping with more accuracy and reliable planning by contouring possible variations and uncertainties.

At a high level, if you want to run a Monte Carlo simulation, you'll download one of the countless Excel-based examples and run through the following steps:

  1. Gather historical data. Collect historical delivery data including tasks, duration, costs, and resource utilization.
  2. Prepare the data in Excel. Organize the data in Excel, categorize tasks, noting their historical durations and costs.
  3. Define the probability distribution For each task, determine the probability distribution based on historical data to model variability and uncertainty. A probability distribution describes how the value of a random variable are distributed. It shows the likelihood of each possible outcome.
  4. Set up the simulation. Input the probability distribution and define the number of simulations to run, i.e. 10,000.
  5. Run simulations Execute the Monte Carlo simulation to generate a range of possible outcomes for project timelines and cost.
  6. Analyze results. Review the simulation output to identify the most likely project completion date and cost, along with best-case and worst-case scenarios.

The key here is the simulations. By using past data, probability distribution and a random number generator, each simulation will generate a possible outcome. By running many simulations, i.e. 10,000, we expect the average to give us a reasonable estimate, including a best-worst-case scenario.

Some points to consider when you're looking at Monte Carlo.

It requires some understanding of statistics and probability. I'd argue in most cases, a level of training or learning would be needed to gain valuable estimations.

The accuracy of the simulation is highly dependent on the quality of the input data and the appropriateness of the chosen probability distribution. And there are a number of places where the quantitative, the expert judgment, creeps in, such as the probability distributions, along with making a judgment that, of the task to be estimated, is actually related to any of the historical data.

At a conceptual level, Statistical PERT (or SPERT) is similar.

It's a tool, generally a spreadsheet or a specialized application, that can be used to estimate timelines and costs by utilizing three-point estimates derived from the historical data. The optimistic best case, pessimistic worst case, and most likely scenario for each task. This technique leverages the historical data to calculate those three points.

Imagine planning a project with best, worst and most expected outcome for each task. SPERT then combines these estimates to calculate a weighted average, producing a prediction of the project's overall timeline and cost.

To use SPERT, you'd go through the following steps.

  1. Gather historical data. Collect historical data, focusing on task duration, cost and resource usage.
  2. Identify tasks Break down the project into individual tasks or user stories
  3. Define the three point estimate. Optimistic, pessimistic, and most likely. Optimistic is the best-case scenario based on the historic data. Pessimistic is the worst-case scenario based on the historic data. Most likely is the most probable scenario based on historical data.
  4. Calculate expected durations This is based on the PERT formula, which is based on the free point estimate. This can be done by hand and the formula can be found online, but realistically you'd want to use a tool to do this.
  5. Calculate the standard deviation Estimate the variability for each task Again, you can let the tool handle this
  6. Aggregate Estimates Sum the expected durations and standard deviations for all tasks to get the overall project estimates.
  7. Review and adjust continually refine estimates as more information becomes available and tasks progress

By utilising a three point estimate, you are able to generate a graph of potential outcomes, normally following the bell-shaped curve, where the probability of estimate being correct is low for the optimistic, average for the most likely, and low again for the pessimistic. This allows us to say, points on the graph is at least 80% likelihood.

From a communication perspective, the graph is a really good way of representing the uncertainty of the estimate to the wider community. With something that's a high certainty, you'd expect a narrower bell curve to something with less certainty, ultimately leading to something that looked like a pancake if there was so little certainty that the most pessimistic is so far removed from the optimistic.

Again, similar to Monte Carlo, this requires some understanding of statistics and probability. And I'm certainly not going to try and explain standard deviation on a podcast. It's something that a Google search is much more likely to make understandable.

So again, I'd argue that in most cases, a level of training, learning would be needed to gain valuable estimates.

And again, the accuracy of the simulation is highly dependent on the quality of the input data and the appropriateness of the chosen probability distribution.

And again, there are a number of places where quantitative, the expert judgment, creep in.

Both Monte Carlo and SPERT are interesting techniques, and I freely admit to not having any real experience using them. But neither seem likely to give you a silver bullet for software estimation. Both will be calculating estimates, but we have to remember that means they are not reality. Thus, as with any estimation technique, we should not be putting undue expectation on them, regardless of how impressive the maths or statistical modelling that goes into them.

There's probably more danger here with a Quantitative estimate that people will expect an estimate to be precise. Any form of statistical processing is built to build a trend. It isn't expected to be right 100% of the time, and it certainly won't be. You will have outliers.

Any individual estimate can be wildly out and should be expected to be.

However, over time, we would expect statistical techniques to produce a solid trend.

I do, however, really like the visualisation that can be achieved through SPERT. I'll provide a link in the show notes to the Statistical PERT website, which provides a free Excel download if you're interested in taking a look at how it works.

It should be noted that both approaches need time and effort to understand, set up and refine. Even more if there's a limited understanding of statistics and probabilities within the team. But I'd certainly be interested in hearing if you've had any success in using either of these techniques within your own teams.

In this episode I wanted to introduce two Quantitative techniques Monte Carlo Simulations and Statistical PERT

While there are interesting techniques based on applying statistics and probabilities to historical data, they are not without an effort to implement.

Yes, they can seem like the golden path, removing the effort from the team to produce, but they, like anything else, are not a silver bullet to the approach.

With both Qualitative and Quantitative approaches, it's easy to see that there are pros and cons to individual practices. And realistically, it feels like you need a blend of various approaches to create truly valuable estimates.

But, this all seems like a lot of work and investment - a lot of training, tools and time to achieve - and to iteratively improve to nudge towards that valuable estimate over time.

As I've said many times in this series, producing estimates cost. Producing valuable estimates costs a lot.

But it doesn't take much of a leap of logic to ask, can artificial intelligence help us with this?

And this will be a subject of exploration in the next episode.

In an ideal world, there would be an AI-powered tool that would just do the work for us. Thus, I explore how such a tool could come into being, and probably more importantly, why I doubt it will happen any time soon.

Thank you for taking the time to listen to this podcast. I look forward to speaking to you again next week.