This episode is part of a wider mini-series looking at Estimation in Software Development. In this episode, I ask the question, is AI the answer? Following on from the episodes on Qualitative and Quantitative approaches, it's easy to see there are pros and cons to individual practices. And realistically, it feels like you will need a blend of various approaches to create truly valuable estimates. But, this all seems like a lot of work and investment, a lot of training, tools and time to achieve. And even more so to iteratively improve to nudge towards those valuable estimates. As I'll say many times in this series, producing estimates costs. Producing valuable estimates costs a lot. Thus, it doesn't take much of a leap of logic to ask the question, can AI help us here? In an ideal world, there would be an AI-powered tool that would just do the work for us. Thus, I explore how such a tool could come into being, and probably more importantly, why I doubt it will happen any time soon.
Or listen at:
Published: Wed, 19 Feb 2025 01:00:00 GMT
Hello, and welcome back to the Better ROI from Software Development podcast.
This episode is part of a wider mini-series looking at estimation within software development. I started the mini-series in episode 189 by providing the following guidelines:
Subsequent episodes take a deeper dive into specific aspects for estimation in software development. And while long-term listeners may find an amount of repetition across the series, I wanted each episode to be understandable in its own right, as much as practical, to be self-contained advice.
In the last few episodes, I have looked at a number of methods that fall either under the Qualitative approach of software estimation, or the Quantitative approach of software estimation.
Qualitative estimation is predominantly based on expert judgment, something based on subjective thought process.
Whereas Quantitative is based on something we can count or calculate, a use of statistical analysis based on historical data.
For me, there are various pros and cons to either approach, and realistically, it feels like you need a blend of various approaches to create truly valuable estimates over time.
Back in episode 191, I introduced the shorthand of Valuable Estimate, an estimate that is desirable for the organization asking for it, and for it to be valuable, it will likely need to have an acceptable level of accuracy and precision.
Accuracy of an estimate being how correct the estimate was to the actual value
Precision of an estimate being how close to the actual value we are attempting to be.
But this all seems like a lot of work and investment - a lot of training, tools and time to achieve - and to iteratively improve to nudge towards that truly valuable estimate over time.
As I've said before, producing estimates cost. Producing valuable estimates costs a lot.
Thus, it isn't much of a leap of logic to ask the question, can artificial intelligence help us here?
This episode is probably more a thought experiment.
In an ideal world, there would be an AI-powered tool that would just do the work for us, but at the time of publishing, I can't see one. Thus, in this episode, I explore how such a tool could come into being, and probably more importantly, why I doubt it will happen any time soon.
Let's take a step back.
Putting artificial intelligence to one side for now, can computer automation help?
Regardless of if the approach is Qualitative or Quantitative, or a combination of the two, computer automation should be really helpful. Computers can process raw data much faster than we can by hand, so that seems sensible. And to an extent I'd argue that's what Monte Carlo simulations and SPERT tools are helping us with.
And you could argue that if we had the data, we could even produce a checklist of standard software development activities and the average time to do. However, in practice we rarely have that data, and even if we do, it's rare to be representative enough to be useful.
So, does Artificial Intelligence fix everything?
At the moment, no.
Current AI systems, often referred to as narrow or inference-based AI, are based for specific tasks. Examples include voice assistants, image recognition, and recommendation systems.
These AI systems excel in their designated areas, but lack the ability to perform tasks outside of their trained capabilities, or generalize knowledge to other domains.
Whereas, for our purposes, I suspect we will need more General Intelligence, known as Artificial General Intelligence or AGI.
AGI refers to a type of artificial intelligence that possesses the ability to understand, learn and apply knowledge across a wide range of tasks at a human-like level. AGI can perform any intellectual task that a human can, with the ability to transfer learning and reasoning across different domains.
AGI is the future we have been promised, or warned about, from countless sci-fi movies and books.
And while many of the current AI systems can feel outwardly like Artificial General Intelligence, they're arguably a long way from it.
Yes, you might be able to have a reasonable conversation with ChatGPT, but it's narrowly trained for natural language processing, conversing with us through language. It's not trained on how to drive a car.
This inference training is based on training a model on a large quality of data, from a narrow usage.
So in the case of ChatGPT, it has been trained by taking large quantities of written content, tokenizing it, which is turning words to numbers, then building relationships between those tokens. So much so, that it uses maths to then make those inferences.
The model becomes a map of differences between those tokens, similar to a physical map giving the distance difference between cities.
By having this map, you can infer which cities are similar to London by looking at the difference. So Paris is more similar than Beijing or New York, at least based on physical distance.
Now imagine not just representing the physical distance between the cities. Imagine including hundreds of dimensions, such as culture, size, industry, religion, population, etc. And you start to get an idea of what an inference model is.
As humans, we generally struggle to comprehend beyond three dimensions. Take, for example, any graph that has an x-axis, the horizontal, the y-axis, the vertical, and the z-axis, the depth. And even then, the z-axis can be difficult to represent in two-dimensional images.
Computers do not have that limitation.
Because the computer can reference across many more dimensions, it can use that to make inferences. For example, think about this maths question.
King minus male plus female equals what?
Because of how the model has been trained, it can effectively look at the difference between male and female and apply that same difference to king to get to queen.
Internally, these are all multi-dimensional coordinates.
I'm not expecting anyone to understand the details of model creation from my description, but the key takeaway is that the model is using multi-dimensional coordinates and distances between points to calculate its response. It's using maths to get the next token for the model, which it then converts back to a word, which it prints out as part of the ChatGPT response to our question about how to air fry the perfect roast chicken.
It's impressive in what it does, but it is just basic maths.
It has no idea what a king is, what male or female or queen or royalty or roast chicken is. It doesn't know what any of these words really mean. Just the mathematical distance between them.
But, could we train a new model? A model trained on historic estimation data? A model that could then use historic data to infer the estimate for a new piece of work?
In theory, yes. In practice, I doubt it will be practical or effective.
Training an AI model is incredibly reliant on the quality and quantity of the data.
It needs quantity so that it has enough examples to be able to infer a reasonable model. And when I'm talking quantity here, I'm talking "big data" levels of quantity. Even the largest of organizations is unlikely to have enough data, assuming they've even been collecting it.
"Big data" is one of the cornerstones of the current artificial intelligence bubble. Much of the theories behind the current AI has been circulating for decades. It's only recently that we've had access to the quantity of data, "big data", and the necessary processing power, "the cloud", to utilize it.
Simply having the last 12 months worth of your team's estimations is not going to be even a fraction of the amount that you need to train the current models.
Ok, let's assume you've managed to find the quantity. Our next problem will be the quality of the data. AI model training is very much rubbish in, rubbish out.
As anyone that has ever trained an AI model will tell you, the biggest activity by far is making sure the data is of good quality. This is the unsexy drudgery that gets glossed over in the hype of AI.
Just because you have all this data doesn't mean that it's consistent. And given the volume is likely to come from a variety of sources, many organisations around the world, then there will be zero chance that the data has consistencies of terms and understanding.
I talked previously in episode 194 how even within the same team there'll be inconsistency of understanding, let alone an organisation, let alone a data set of this size.
In practice, this data does not exist in either the quantity or quality enough to build a model, and I can't ever see a use case that would prompt anyone to try to.
So, if the current narrow inference-based training is unlikely to provide an AI estimator, will Artificial General Intelligence?
I'd say this is more technically possible - but unlikely to ever occur.
Artificial General Intelligence wouldn't be reliant on training data to make its inference. Rather, it could monitor and intuit from available data. It would be better at drawing correlations from different data types, different team types, different outcomes.
It would be much more likely to be able to intuit patterns within the chaos, very much the same way that the human brain does. At which point we have to wonder if this is actually any better or different than the Qualitative expert judgment I talked about in episode 200.
But why do I say it's unlikely to provide us with an AI estimator?
Simply put, when we achieve Artificial General Intelligence, then everything changes.
Once we achieve Artificial General Intelligence, I question if software development will even exist. It will arguably be the biggest evolutionary change in the human species. It seems unlikely that anyone will be worried about software estimation at that point.
However, before we bow to our AI overlords, we have to consider when Artificial General Intelligence is likely to be achieved.
There is a broad range of opinion on this, ranging from it's just round the corner, to it's decades away, to it will never be achieved.
Personally, I believe we are talking decades. I believe it will happen, but I'm less sure if it will happen within my lifetime, or at least be viable to be in general use.
So, for now, we are still with my original guidelines for software estimation:
In this episode, I've deviated from the normal format and used it as a thought experiment to consider if artificial intelligence helps us with producing valuable software development estimates.
I seriously doubt that the data exists to exploit this current narrow inference based models and when we eventually achieve Artificial General Intelligence, I doubt Software Estimation will even be a question.
In the next episode, I move on to the penultimate episode in this Software Estimation mini-series. I want to discuss how Software Estimation works in terms of professionalism. In many cases, surprisingly, the professional response is to not provide an estimate.
You may be asking, why would an episode on professionalism be included in a series focused on software estimation?
This is, for me, closely related to episode 198, where I talked about the psychological scarring left behind from decades of using estimates as punitive targets.
In a similar way, how many developers have been described as unprofessional when they legitimately cannot provide an estimate?
What is the more professional response? To give the off-the-cuff guess to keep the software development process moving? Or take the harder path of explaining why you can't provide that estimate?
Thank you for taking the time to listen to this episode. I look forward to speaking to you again next week.