#15 - Monitoring

In last weeks episode I talked about software testing - predominantly around automation to aid in flow of delivering value to your customer.

One point I touched on in the episode is that you cannot, and indeed should not, aim or expect to test everything.

While I can understand an expectation or a desire to test everything 100% - it simply isn't feasible.

Pre-release testing is to gain enough confidence to put it live.

The second half of that is post-release Monitoring.

And it is Monitoring that I want to cover in this episode.

Or listen at:

Published: Wed, 30 Oct 2019 16:46:22 GMT


The dictionary definition of Monitoring is

"observe and check the progress or quality of (something) over a period of time; keep under systematic review."

And that is exactly what we are attempting to accomplish with Monitoring.

While software testing is to give us confidence and reduce risk pre-release.

Monitoring is our safety net post-release.

We should be using Monitoring to establish if our software is behaving differently that we would have expected.

We should have certain expectation, normally gathered over time, of key indicators.

For example, if you are operating an online sales website, you are likely to expect a number of order per hour. Or maybe you expect a certain basket value or geographic profile.

What would be considered "normal" would be very much dependant on each organization.

Take for example the number of web sales per hour;

You would expect a significantly different number than the likes of Amazon - it will be dependant on your organisations history.

The key indicators will depend on your individual organisation. They should however be things that support the goals of the organisation.

They should be the most critical aspects that you care about.

These become the metrics central to how we think about and manage the software. For the rest of this podcast I will use term "metrics" to refer to those key indicators that we are monitoring.

So what should be happening;

Something will be automatically monitoring your software for those metrcis. It will then be comparing the periodic results to an expected range - and then alert if that indicator falls outside of that range.

So back to our web sales indicator;

If that drops beneath your expected threshold, then that could be an indication that the software you've just released is causing problems - definitely a source for investigation.

Maybe that alert automatically notifies the relevant support team to investigate.

Maybe that alert automatically adjusts marketing spend with real time advertisers to limit bad spend.

Maybe that alert automatically reverts the preceding change - on the assumption that it is at fault.

The important thing is that you are made aware as quickly as possible to allow action to minimise any problems.

Now this may feel like the wrong thing to you;

You are possibly thinking that you would rather expend more effort in upfront testing - with the aim of having zero post-release problems. Making post-release monitoring pointless.

And I can understand why you would think that way.

A large portion of the software development industry still thinks that way.

However, it isn't the best way of thinking about it.

Monitoring is not a substitute for pre-release testing. A development team should not think they have a "free-pass" on pre-release quality if a post-release monitoring solution is in place.

That would lead to low quality, an increase in costs to resolve and ultimately slow the delivery of value to the customer.

In the same respect you shouldn't be trying to test your way out of using monitoring. As I've said you can't test for everything.

To do so would lead to long test times, an increase in costs to perform and, again, ultimately slow the delivery of value to the customer.

As with all things, a balance needs to be sought.

Use that post-release monitoring to get fast feedback. Then use that feedback to decide if additional pre-release testing is justified. Never take that as an automatic response.

You should also consider the benefit monitoring provides if failures occur that aren't as a result of a release.

That same monitoring would alert if there is server failure in the middle of the night - or if the database crashes - or you are under a denial-of-service cyber attack - basically anything that would drop that metric under that acceptable range.

DevOps is often depicted as an infinity loop made up 7 stages - one leading to the next, until it loops back round and repeats itself.

A Monitoring stage on that diagram is always show just prior to the Planning stage.

DevOps expects (or rather demands) that post-release Monitoring is used to provide that feedback to drive the planning for the next iteration.

Monitoring is one of the key components.

This isn't just for confidence that our last release went well

This is to understand how the system, and ultimately, the organisation is performing to its goals.

Our Monitoring should be on those aspects that underpin the organisational goals.

We should know how they trend overtime. And which ones require focus to move us to the organisation goals.

That is why its so key to DevOps prior to the planning exercise.

Peter Drucker is credited with the quote:

"If you can't measure it, you can't improve it"

So all that monitoring goes back into our experimental thinking when we plan what to do next.

Thus the DevOps diagram often being represented as an infinity loop.

Sometime you may feel that you don't receive enough "natural traffic" to provide reliable metrics.

In such a case, you may want to look at Synthetic Transactions

Synthetic Transactions are "faked" customer actions. So for example, a fake web order is added to the website.

That Synthetic Transaction can then be used by the monitoring to verify everything is working as expected. For example, if you schedule your fake web order to be created every hour, then if the monitoring does not see one for just over an hour, then it will alert as appropriate.

This is often compared to the canaries used in coal mines. The canary would be an indicator that the air in the mine was no longer breathable and action should be taken.

A word of warning though; Synthetic Transactions can be complex to use - both in technical impact and logic use.

Take our fake web order example; we don't want our fake order to reduce stock so a prospective customer cannot purchase it.

This can lead to its own complications and should be considered carefully before use.

In this podcast, I've introduced Monitoring as complimentary to pre-release Software Testing.

I've talked about it is not only a good way of protecting our systems from release, they are also useful for identify unexpected events during normal running.

I've talked about how we should be monitoring the most important things to our organisation. Those things that underpin our business goals. These things, whatever they are, should be things we want to improve through experimental changes to our system.

Monitoring those things should demonstrate if we are going the right direction with the software product to support the organisations goals.

And I've given the name "metrics" to those things.

And I've also briefly introduced the idea of synthetic transactions as a means of ensuring we have something to measure in low volume environments.