#134: DevOps Topologies - Working types

In this episode I want to continue the talk about the team structures discussed on https://web.devopstopologies.com/ - with a focus this week on the topologies are shown to work.

The devopstopologies.com website is based on the work by Matthew Skelton & Manuel Pais. I introduced Matthew and Manual as authors of the Team Topologies book in the last episode. The work is a pre-cursor to the Team Topologies book - and work that I feel stands on its own two feet - with a specific look into how teams should work together to gain benefits from DevOps.

Or listen at:

Published: Tue, 31 May 2022 15:56:53 GMT

Transcript

Hello and welcome back to the Better ROI from Software Development Podcast.

In last week's episode, I started to introduce the advice on team structuring from DevOpsTopologies.com, and I talked through the Anti-Types that they define:

Dev and Ops silos
Dev and DBA silos
DevOps team silo
No Ops needed
Embedded Ops
DevOps as a tools team
Rebranded sysadmin
and fake SRE

In this week's episode, I want to talk through the topologies that have been shown to work.

But first, a bit of a recap, what is DevOps?

I love the Microsoft definition of this:

"A compound of development (Dev) and operations (Ops), DevOps is the union of people, process, and technology to continually provide value to customers."

It's a marriage of traditionally opposing forces, innovation and change of Dev, stability and limiting change of Ops.

DevOps Topologies provides examples of how those two teams should and should not interact. The website is based on the work by Matthew Skelton and Manuel Pais. I introduced Matthew and Manuel as the authors of the Team Topologies book back in episode 132.

The work on DevOps Topologies is a precursor to the Topologies book, but work, I feel, stands on its own two feet with a specific look into how teams should work together to gain benefit from DevOps.

The website introduced DevOps with this:

"The primary goal of any DevOps effort within an organisation is to improve the delivery of value for customers and the business, not in itself to reduce costs, increase automation, or drive everything from configuration management; this means that different organisations might need different team structures in order for effective Dev and Ops collaboration to take place."

The site also goes on to say:

"Remember: There is no "right" team topology, but several "bad" topologies for any one organisation."

In the last episode I looked at the Anti-Types they define - commonly seen bad team structures that help us understand the what NOT to do. In this episode, I'll be looking at some examples of topologies that can work:.

Dev and Ops collaboration
Fully shared Ops responsibilities
Ops as Infrastructure-as-a-Service
DevOps as an external service
DevOps team with an expiry date
DevOps advocacy team
Site Reliability Engineering team
Container driven collaboration
And Dev and DBA collaboration.

I'll go through each of these in turn.

The Dev and Ops collaboration topology.

DevOpsTopology.com described this as the "promised land" - with there being clear collaboration between Dev and Ops. They go on to describe it as the team must have clearly expressed and demonstrate effective shared goals such as "delivering reliable frequent changes".

The key here is the word "shared".

The dysfunctional opposing goals need to be aligned, otherwise that "them" and "us" remains. Both teams must be comfortable with recognising each other's strengths and meeting each other where they are.

The site does warn this is likely to need a quite substantial organisational change to establish it and a good degree of competence higher up in the technical management team.

The fully shared Ops responsibility topology.

This is similar to the Dev and Ops collaboration, but rather than some shared responsibilities, they are effectively one team. This is where you may have a product development team that is truly cross-functional and includes all the relevant skills needed to deliver quality, reliable products.

For me, this is the ideal as the team is completely empowered for pretty much everything.

However, there may be a question over how practical this is for most organisations. There will obviously be a budgetary overhead in having that level of operational knowledge on every product team - and in many cases this will just be prohibitively expensive for the benefit achieved.

In those cases, I'd personally recommend the Dev and Ops collaboration and ensuring shared responsibilities for as much as is financially practical.

The Ops as Infrastructure-as-a-Service topology.

This for me is a topology that may be a stepping stone to moving to that Dev and Ops collaboration model. This can work well if we have an existing fairly traditional Ops team that are unable to change rapidly enough for the organisation. In this structure, a DevOps team is created within the Dev team and they interact with the Ops team at the infrastructure level similar to how you interact with maybe a cloud provider.

This isn't going to be perfect as you aren't getting the Ops brought into the same shared goals. But it can be a useful way to bridge the gap until such point as the organisation can move to full Dev and Ops collaboration.

The DevOps as an external service topology.

This is similar to the Ops as Infrastructure-as-a-Service, but using a third party provider. The third party provider helps you to establish those working practises and provides both the Ops and most of the DevOps capabilities.

The website recommends this approach if your organisation maybe doesn't have the Ops experience in-house either due to size, finance or age. By leveraging a third party, you are taking advantage of their expertise and best practise.

The key for me is making sure that the communication, the goals, are shared appropriately with a third party. It can easily turn into a supplier/ consumer model with each trying to get the better of the other. This needs to be a partnership with the teams working well together.

Again, this could be seen as a stepping stone to that Dev and Ops collaboration model within the organisation - with the organisation recruiting their own teams to slowly take over the responsibilities of that third party.

DevOps team with an expiry date topology.

Now this is an adaption of the Anti-Type "DevOps team silo" that I talked about in the last episode. A third team is stood up called DevOps, which sits between dev and Ops.

The key here is the aim of that DevOps team is to make itself obsolete. The team should first act as a translator between Dev and Ops, but with the aim of moving that work into a shared responsibility model where over time they can step further and further away. - letting Dev and Ops talk to each other directly.

This can be useful, as a DevOps team have a goal of bringing the two teams together and thus can focus on it - whereas the Dev and Ops teams themselves will likely push it to the back burner due to other demands.

The danger, of course, is that the DevOps team becomes a permanent fixture, in which case we've made everything worse by having 3 teams coordinating poorly, where previously we only had 2.

The DevOps advocacy team.

They shares a lot of similarities with the DevOps team with an expiry date model but is expected to have a longer lifespan. The key difference is that this team is here to facilitate the collaboration between the two teams, not actually do it for them.

They will evangelise and promote DevOps practises and they will help the Dev and Ops teams to remain in collaboration, even if that is with the collaboration of that DevOps advocacy team.

I think about this more like coaching than doing - a personal trainer will coach you on your exercise regime, they can't do the exercises for you.

Of course, the danger here is that the DevOps advocacy team simply do the work - thus taking us back to that siloed 3 teams rather than the 2 teams problem.

The Site Reliability Engineering team topology.

I've talked about the Site Reliability Engineering principles and practises from Google in previous episodes - and the SRE team topology is slightly different than most of the other DevOps models.

The SRE team are effectively responsible for running software at scale within Google. The SRE team are a specialised part of the Ops team - with considerable engineering maturity. They work closely with the Dev team to provide the DevOps functionality to allow a product to run successfully at Google Scale.

While not all Google products work in this way. I believe that most of the major products do.

In this model, the operational responsibility is shared between the Dev and the SRE team. But realistically, it's the SRE team that take the brunt of the problem handling and as such the SRE team have considerable authority to reject substandard products or make recommendations for the operational performance of a product.

The SRE team avoids the "them" and "us" by working in close collaboration with Dev and Ops, with aligned goals for the product success.

One of the reasons this works at Google is that the SRE practise is very mature and the organisation values its work, thus gives it the authority needed to do its job.

Care would need to be taken in implementing SRE in a less mature organisation where they're not providing the authority to do the job correctly. In that situation we are back to creating a third team with conflicting goals.

The Container driven collaboration topology.

I introduced containers back in episode 43. The idea is that our software is shipped in standard uniform "containers" - very much like shipping containers. That uniformity helps us to provide a logical boundary between Dev and Ops.

Dev have the responsibility of correctly creating the container - with the relevant manifest of runtime requirements for it, such as the size of server, the speed, the memory, etc.

And ops have the responsibility of providing the platform that allows those containers to operate.

Personally I can see benefit in this approach. But as the DevOps Topologies website warns, it requires a sound engineering culture. If Dev starts to ignore operation considerations, this model can revert towards an adversarial "us" and "them" situation.

The Dev and DBA collaboration topology.

In the last episode I talked about the Anti-Type of Dev and DBA silos - we have our Dev and our Database Administrators (DBAs) acting as siloed teams. Now this can occur in large organisations that have a lot of data. Due to the amount or importance of the data, we use a dedicated DBA team to manage and maintain that data. Now, this often leads to the DBAs becoming gatekeepers to any change made by Dev that would affect data - which is likely to be pretty much everything.

The Dev and DBA collaboration topology attempts to resolve the problems of this by having members of the Dev team with database capabilities. We aim to pair them up with DBAs from the Ops team. The aim is to foster a close working relationship between those members of the two teams, and by doing so, establishing shared responsibility between them.

Ultimately, this is about collaboration.

In this episode, I continue to look at some of the team structuring coming out of the DevOps Topology website. While in last week's episode I talked about the Anti-Types/ bad examples, in this episode I looked at some topologies that can work:

Dev and Ops collaboration
Fully shared Ops responsibilities
Ops as Infrastructure-as-a-Service
DevOps as an external service
DevOps team with an expiry date
DevOps advocacy team
Site Reliability Engineering team
Container driven collaboration
And Dev and DBA collaboration.

There are definitely some commonality in these approaches, as much of it is about increasing that collaboration and having shared goals - ultimately breaking down of those barriers that we naturally create for our tribes.

The other commonality is how easy it is to move from one of these into an Anti-Type and something that we need to be constantly monitoring for.

Thank you for taking the time to listen to this episode. I look forward to speaking to you again next week.

#134: DevOps Topologies - Working types

Links

Transcript