The State of DevOps report provides excellent insight through rigorous analysis of its wide reaching survey. I introduced the State of DevOps report back in episode 13 and in this episode I take a look at the 2021 edition. Listen to this episode if you are looking for the justification "why" you invest in the effort and disruption that DevOps requires. Or listen to this episode to understand "how" the practices can help you produce better outcomes from software development.
Or listen at:
Published: Wed, 09 Feb 2022 17:09:04 GMT
Hello, and welcome back to the Better ROI from Software Development podcast.
In this episode, I want to talk about the latest installment of the State of DevOps report.
Why might this topic be of interest to you?
First and foremost, I think the State of DevOps reports provides an excellent justification of why you would like to look at DevOps and their practises. It provides clear evidence of outcomes based on using these sorts of processes. It provides a business case.
It's based on a large analysis of research. There is rigorous research into improvements in those business outcomes experienced by a wide base of survey respondents.
In that analysis, they're able to take that research and provide groups into elite high, medium and low performing organisations.
It also provides a means for you to be able to benchmark your own organisation against those groups.
And it provides a correlation conclusions advice based on that data regarding various practises that help help your organisation improve its software development and ultimately business outcomes.
So what will this episode cover?
Well, I'll give a recap of the State of DevOps report, what it is - I've discussed it previously back in episode 13, but I'd go for it again here. I'll talk about the key benefits of the report found between the most elite performance and the lowest performers. I'll talk about the metrics they use to get to that. I'll talk about how you can benchmark your own organisation. And in future episodes, I will look specifically at each of the capabilities that the report provides guidance on to help you drive better performance.
So what is the State of DevOps report?
I originally talked about this back in episode 13; It's a survey, analysis and report carried out by the DORA team - DORA standing for DevOps Research and Assessment. Over the last seven years of research, the team have taken data from over 32,000 professionals worldwide. The team pride themselves on the statistical rigour and their scientific approach taken to, firstly producing the survey, running the analysis and then producing the conclusions in the form of a report. They claim that it's the longest running, academically rigorous research investigation of its kind.
So when we talk about State of DevOps, what is the "DevOps"?
Well, I've introduced DevOps previously in episode 10. Unfortunately, the term DevOps is like so many terms in technology, it's been corrupted over time. Unfortunately, too many marketing departments have got, "Oh, we must have DevOps", and it's become a catchall. More often than not, you'll find a product has a includes "DevOps" label stuck to the side of it.
So with so many definitions, I do like to use the Microsoft definition. Microsoft defined DevOps as:
"A compound of development (Dev) and operations (Ops), DevOps is the union of people, process, and technology to continually provide value to customers."
DevOps for me, actually addresses one of the fundamental challenges within traditional IT.
You normally have two departments with opposing goals, you have a development team whose goal is to change is to make new things, and you have an operations team who look at keeping stability - they don't want change.
So you have two conflicting parties, one that wants to change and one that doesn't want change. I'm oversimplifying greatly, but it does happen.
So by using DevOps in some of the practises and processes, you start to embed the ability to change with the controls needed by operations to make sure that it's safe to do so.
And while obviously DORA and the report are focussed on DevOps practises, I believe that many of the practises are universal. They provide benefit regardless if you're officially following the DevOps culture. In short, don't get hung up on the term DevOps.
The latest edition of the DevOps report, the 2021 version is the seventh edition. For me, the report provides clear evidence that DevOps practises greatly improve business outcomes. It provides the why to the disruption that implementation of these practises can create. It provides the how you should measure your software development to get better organisational outcomes. And it provides guidance and advice on some specific practises that are shown to be driving performance.
Let's look at the why, the evidence, the hook, the sales pitch as to why you want to not just implement these practises but excel at them.
The report groups its respondents into different groups, you have the elite performers, the high performers, medium performers and the low performers. And I look at how these groupings work a little more when talking about the metrics used.
But the key thing is is that report is finding that the elite performers are so much better than the low performers and producing astonishing results in how they're doing their software development and ultimately the business outcomes they achieve.
The headline figures of when you compare the elite to the low performers, you're finding that it's the elite performers are having:.
3x lower change failure rate - 3x less likely to have problems during a release.
Then 973x more frequent code deployments then they're lower performers.
And in terms of faster time to recovery from incident and faster time from commit, when the software developer has made code changes to the deployment, they are finding that the elite performers are 6570x faster than they're low performers.
The report even specifically calls out that that 6570x is not an editorial typo.
Let's take a moment for that to sink in.
I think the final thoughts from the report summarises it all nicely:.
"After seven years of research, we continue to see the benefits that DevOps brings to organizations.
Year over year organizations continue to accelerate and improve."
Teams that embrace its principles and capabilities can deliver software quickly and reliably, all while driving value directly to the business."
Ok, so hopefully the why has whet your appetite to understand a bit more. For me, the why is quite an incentive to be looking at these things, not just to implement them in the first place, but continually look to gain improvements.
At the heart of the survey and the report and the analysis are the metrics used.
Now, for me, I'm always a little bit cautious when it comes to metrics, especially around software development. I've seen metrics being put in place that, in my opinion, drive many dysfunctional behaviours.
Metrics, unfortunately, become targets and targets cause myopic, dysfunctional outcomes. I've talked about this previously.
I have a real concern on things like salesmen and their commission base. If you're targeting a sales person on purely selling and they're financially rewarded for it - well, you're very narrow focus what they're interested in. You may think that sales is the only thing they should be thinking about. But are they really getting you good sales? Are they making sure that that sale is good for your customer? Are they making sure that that sale is good for you as an organisation? Or are they just trying to hit a target hit that metric so they get financially rewarded for it? Unfortunately, I've seen in too many places where the salesperson is over focussed on that commission target. They exhibit dysfunctional behaviours, including selling the wrong thing to the customer, potentially lying to the customer: "Of course, it could do that. It's not a problem. We can definitely do that for you". That's selling the wrong thing. They're not selling the thing that is good for the customer or for you.
Again, this is why I'm very cautious about metrics becoming targets and driving that myopic, dysfunctional outcome.
Dora, however, have spent efforts to make sure the metrics are appropriate as a group. The five should highlight myopic, dysfunctional behaviours in any one subset.
And we'll talk about that as I go through - but by having almost a balancing act between the metrics in use, I would expect overfocus on one to highlight issues in another.
Before I talk about the metrics themselves, we really should be thinking about these per product. So per software system that you may be building internally within your organisation. I often find that in any organisation there are many, many products with differing capabilities. So while you might measure your flagship product based on these metrics, and it might look good. What about the system that's not been touched for many years? Let's sat in the corner under cobwebs. How would that rate if you went through the same metrics? I'll come back to this in a bit.
Ok, let's get onto the metrics;
The first to talk about Throughput - throughput of software developed and released through your team. It's made up of two metrics the deployment frequency and the lead time for changes.
The deployment frequency talks about how quickly are you releasing your product into production in a safe and reliable manner?
This is a good time to tie our metrics back to those groupings, those late, high, medium and low.
So when we're talking about deployment frequency, they categorise low as fewer than once per six months. They categorise medium as between once per month and once every six months. The high, as between once per week and once per month. And the elite as on demand, multiple deploys per day.
The second metric within the throughput category is the lead time for changes. How long does it typically take for code to leave the hands of the development team for it to successfully be running in production?
Think about how quickly are you seeing a return on your investment? You've invested money in your development team creating the change. But then how quickly are you then seeing the benefit from it?
DORA categorise this as the low performers being more than six months. Medium performers being between one month and six months. The high performers being between one day and one week. The elite performers less than one hour.
You can think of the throughput metrics as being almost that "Dev", part of the "DevOps". That's about making change, making sure that we can get change done quickly and efficiently.
The second set of characteristics are stability. This is the "Ops" part. This is about making sure you have a reliable service.
Stability introduces two more metrics, and as I say, these balance out the change aspect of the throughput metrics, they introduce time to restore service and change failure rate.
Time to restore service - how long typically does it take to resolve a problem if something breaks? This may be anything from service down to some impairment of the user functionality. These are periods of time where your organisation is likely to be losing value quickly, be it in revenue or reputation.
DORA defines the low performer as more than six months, the medium performer as between one day and one week. The high performers less than one day. And the elite performance less than one hour.
Change failure rate talks about what is a typical percentage of releases that result in service impact that will need remedial work to resolve. On release, does the product fail and need to be rolled back? Or does it impact customer functionality and needs a patch or a hot fix to be applied to resolve?
DORA categorise the elite performers as being between 0-15% of releases. For me, I'd certainly expect it to be in that lower angle for it to be considered elite.
And finally, the fifth metric - Reliability.
Where the prior four focussed on the software delivery performance, this focus is on the operational performance.
DORA asked the respondents to rate their ability to meet or exceed their reliability targets, be that availability, performance, scalability - whatever the targets were for that organisation.
So let's revisit how these metrics balance themselves out.
As I've said, metrics on their own can be dangerous. They can provide dysfunctional targets. If we had an overfocus on that throughput, that delivery frequency, we could get into very much a Wild West approach where any and everything is just pushed live.
We're just throwing stuff at the wall and see what sticks. We can soon find ourselves with a spaghetti mess of systems with an inability to maintain, secure, even run them properly. It's not uncommon that stability becomes so poor that all development team time is then spent fire fighting and trying to just keep the ship afloat.
Whereas if we over focus on the stability side, we can see over bureaucratic controls being put in place. We can find those controls stifling innovation and making it difficult to adapt to market pressures.
By including these balancing metrics in the survey, it should help us to understand whether we may be over focussed on one more than another.
Potentially we are being too cowboy about our software - maybe we're putting stuff out without controls.
Or maybe we're constraining ourselves too much and slowing ourselves down.
DevOps provides us with various capabilities to strike the correct balance between control and that innovation that we need in modern business.
I think one of the great value ads that the DevOps report provides is an ability for you to benchmark your organisation.
So where do you stack up in your own organisation's journey against other organisations?
What do you look at next? Maybe you're only looking to start, or maybe you're evolving to the next level. Maybe you're already partway down this journey, and you may consider yourself to be in the low or medium. What would you do next to move into the higher categories?
And DORA provides an excellent quick check facility - I'll provide a link in the show notes. It's a simple website, and I'll walk through an example just to show you how quick and easy it is to get an idea.
Again, I would reiterate that you'd want to be doing this per system rather than at just at an organisational level.
So with this in mind, let's make up a fake system and see what DORA thinks of how good it is.
Right, the first question, lead time. They ask for the primary application or service you work on, what is your lead time for changes? That is, how long does it take to go from code committed to code successfully running production?
Let's say for this, maybe it's one day to one week.
It then asks for deployment frequency. For the primary application of service you work on, how often does your organisation deploy code to production or at least to its end users?
Well, let's say it's between once per week and once per month.
Time to restore. For the primary application or service you work on, how long does it generally take to restore service when a service incident or a defect in customers occur?
Well, let's say maybe this might be a little bit longer, maybe this is one week to a month.
Change failure percentage. For the primary application or service you work on, what percentage of changes to production or releases to users result in degraded service?
Well, I'm fairly happy with this made up service, I'm fairly certain we don't get a lot of problems. So let's say that's in the zero to 15 percent category.
And finally, it allows us to choose an industry that we can benchmark our results against the industry norm. So in this case, I think I'll choose to be in insurance.
Then I simply click to view the results.
So the results are in: my software delivery performance is "Medium". And they're telling me I'm performing better than 40 percent of the state of DevOps respondents.
It gives me an idea of where I fit within that category. It gives me then a list of areas for improvement. It gives me performance comparison to not just other industries, but my specific industry as to where I sit - in some cases, here it looks as if we're actually quite behind. And it gives general advice of what a medium performer looks like. Thus, give me an idea of what I'd need to address to try and move forward into that higher performance category and ultimately into the elite performance.
Again, I would reiterate that if you run this, you want to do it system by system rather than cherry picking. Organisations generally have multiple systems. Some of them will be new and shiny and may very easily fall into this elite category, especially if it's something that's been actively developed and built as we speak.
Some will be coated in a layer of dust and cobwebs and wouldn't have been touched for years. Some almost certainly you'd struggle to do anything with if you actually had a problem because nobody knows how to do it anymore - the people that built it left.
And this is why I think running through some of these benchmarks are such a valuable exercise. By being able to go through and produce a standardised benchmark per product gives you a way of targeting and focussing and going "We're weak in this area". What happens if that system over there - which from an IT perspective we are poorly covering, but from a business perspective is critical - we need to spend time bringing that up to scratch. We need to spend time improving our capabilities within it.
It can be an excellent way to establish a quick overview of the health of your state of systems.
The report also specifically provides guidance on a number of capabilities to help drive performance, and I take a deeper dive into those over the coming weeks. Those include Cloud, Site Reliability Engineering practises (practises that come out of Google, an organisation that has to work hard to cope with scale), the importance of quality documentation, security, technical DevOps capabilities (such as Continuous Integration, automated testing, deployment, etc.) and culture.
In this episode, I've given you a recap of the state of DevOps report, an incredibly rigorous, scientifically backed research project to understand the outcomes being achieved by many, many organisations using DevOps practises.
That 3x lower change failure rate, that 973x more frequent code deployments, that 6570x faster time to recover from incidents, that 6570x faster time from commit to deployment - those hooks, those sales, those evidence as to why you'd want to be looking to implement these practises. The justification.
I've talked about the metrics that are used in the survey and thus feed their way through into the report and the subsequent analysis - those two focussed on the throughput of development, the deployment frequency, the lead time for change - those controlling metrics for stability, the time to restore service, and the change failure rate - and the fifth of reliability.
I've talked about how they then work together to provide a good idea of where the safety and capability of our systems, not just in balancing the control and innovation, but in making sure that we then have a way of looking forwards to improve and make our systems better, more performance and produce better outcomes.
And of course, I touched on the benchmarking capabilities that the DORA team provide so that you can test your own organisation against those metrics to see where you are on your journey of producing better software outcomes, which ultimately would drive better business outcomes.
In the next episode, I'm going to take a deep dive into what the report says about Cloud in terms of helping you to drive performance.
Thank you for taking the time to listen to this episode and look forward to speaking to you again next week.