#149: Legacy Data - advice for dealing with it

Over the last few episodes, I've focused on legacy software - what it is, how it occurs, and various strategies to deal with it.

Alongside that legacy software is the legacy data - which is arguably more important than the data.

In last week's episode, I talked about why thinking about that legacy data is so important.

In this week's, I will provide advice for dealing with it.

Or listen at:

Published: Wed, 14 Sep 2022 16:13:54 GMT


Hello and welcome back to the Better ROI from Software Development Podcast.

Over the last few weeks, I've been running a mini-series looking at Legacy Software - what it is, how it occurs, and various strategies to deal with it. And I've now moved on to looking at the data held in those legacy systems.

In last week's episode, I talked about the importance of thinking about that data when doing any legacy work - after all, the data is likely to have more value than the actual software.

In this episode, I want to provide some advice for dealing with it - and specifically I want to cover:.

  • The big bang versus incremental
  • Open items versus full history
  • Duplicate balances.
  • And the data left behind.

Let's look at each of these in turn.

Big Bang versus incremental.

When faced with the migration of data between legacy and new systems, or even a large transformation within the same system, I will unsurprisingly favour the little and often incremental approach rather than the big bang everything in one go.

I would recommend following the same principles I use for software development: keeping the scope small. It reduces risk. It allows us to learn with less risk.

If we have made incorrect assumptions, we are automatically limiting any negative impacts by having a smaller scope - I'd rather we made a costly mistake with a fraction of our customer base than the entirety of it.

Normally any data migration like this will need to be tied to how you propose to change the legacy software and needs to be considered hand-in-glove with that work. They definitely should not be considered different activities to be carried out by separate teams. Any work needs to be done in lock-step and with full understanding across the data and the software. Having this consideration should influence the planned iteration scope, thus is critical to include in any planning.

There is little point delivering an iteration of the software or data changes when they're not aligned. Otherwise, you are delivering incomplete work, which defeats the aim of little and often delivery to minimise risk and maximise learning.

Conversely, of course, "Big Bang" may seem like an easier course.

It's a once only job, right? We just concentrate on getting it out of the way.

I can understand that thought process. After all, dealing with it in iterative manner is likely to feel like repeating the same work over and over. Thus, the temptation to batch into either a single activity or a small number of large activities. It feels like it's the most efficient thing to do.

But in doing so you are loading all the risk into that one activity. You are taking a massive assumption - do you know exactly how it will work?

You have removed any chance for learning except in the event of complete failure and having to revert and start again. This is much the same rationale for not batching software changes - you want to keep the risk low and the learning high.

And of course to be able to do a big bang, then you are likely to need to wait for most, if not all of any software changes to be ready, thus delaying getting that software investment into production.

Let's move on to open items versus full history.

When migrating data between systems, there is often a question of "what" to migrate - often drawing lines based on some factor, such as the age of the item or its status. For example, only migrating financial data within the last seven years or only migrating any unpaid invoices.

Often this is done either to lighten the load on any migration activity or simply to avoid seemingly unnecessary clutter in the new system.

In some cases, some data will simply not be migrated because the new system does not need it. Say, for example, your organisation previously operated a bill of material system, but it's not been used for over five years. It would be perfectly understandable that you wouldn't want that functionality in any new system and thus have no home for that legacy data.

Whatever the reason, there may be good, legitimate reasons to effectively split the data across systems. The question becomes, what do you do with that remaining data, assuming you cannot simply delete it?

You need to have a plan for it, which I talk about more when I get on to the "and the data left behind section".

Again, having a clear idea of what and how much data is to be migrated is important to consider in plans.

I've seen projects nearing the end of a multi-year development, simply having never considered this with various teams having various different assumptions over exactly what was happening. In one case, the development team made assumptions that much of the legacy data would not be moved - only data defined as currently active or open. As migration neared, it became clear that not only did no one have a clear idea of what was considered "active" or "open", but much of the other data was still required for customer management and critical audit functionalities.

Thus, my comments in the last episode of involving your auditor early.

Let's move on to duplicate balances.

Again, this comes from another example of not including auditors early enough in the work.

A client was nearing migration with a plan to migrate customers from the legacy system to the replacement, with the intention of copying the account balances at the point in time. That seemed sensible, yes?

Until they were pulled up by an auditor, pointing out that by simply duplicating the data, they would be accounting for the same money twice, once in the replacement system and again in the legacy system.

Rather what they needed to do was transaction that balance between the two systems, effectively zeroing the legacy system. Thus, at any point, the sum of all balances across both systems would remain the same at all times.

Again, this would have been picked up earlier if the auditors have been included earlier in the planning. In this, the auditors are your friends.

Let's move on to the data left behind.

Maybe you've chosen to leave some data in a legacy system, as I discussed, in open items versus for history. Or maybe you would like to retain the legacy data as reference, maybe until you've gained confidence in the replacement system or the migrated data.

In these cases, you need to think about how you will retain access to that data. Will you retain the legacy system to allow you to access that data? Is that even feasible if the legacy system has to be replaced due to some critical problem, such as security risk? And if not the legacy system, do you need a second system to provide access to that data? Or maybe simply loading that data into a data warehouse for reporting is enough.

How you access that data will come down to the organisational needs for that data. I'd recommend considering who will need access to that data, how often and for how long.

For one project, the client hadn't considered this. And while original assumptions were that the data would be infrequently accessed by data analysts, they found that operationally they would actually need to have real time access by their customer services agents for at least the next 12 months - and continued access to that legacy system wasn't an option.

This required the development of a second temporary system to provide access to that legacy data, replicating much of the functionality that customer service is required to be effectively able to do their role. This second temporary system was not without significant work and cost to the organisation.

So again, understanding these needs early helps with planning. In some cases, these can have a considerable impact on any business case. You may need to run multiple systems, including the legacy, for a period of time. This could incur unexpected costs, such as:.

  • the cost to run,
  • the cost to license,
  • additional training - for example, customer services agents having to be able to use multiple systems
  • and the cost of keeping all of that data secure.

In this episode, I wanted to cover a number of things to consider and plan for with legacy data. Specifically, I've talked about:

  • big bang versus incremental
  • open items versus full history
  • duplicate balances
  • and the data left behind.

How this is handled will differ every time you do it.

The context will always be different - be it the organisation, the data, the technologies - there are a myriad of different factors.

The key is to consider it early and involve the auditors in those conversations. Leaving this too late can lead to painful problems at the 11th hour. Not including the auditors, those that will effectively mark your migration work leave you open to not just struggling with audits, but potentially being unable to pass them at all.

Think about this as preparing for an exam. If you know the criteria on which the exam will be marked, then you can focus your revision to get the best return on your time.

Thank you for taking the time to listen to this episode and I look forward to speaking to you again next week.