SITRA: Kirjaston tietokoneella Espoossa 10, marraskuuta 2014. Kuva Sari Gustafsson

Published January 11, 2017

Where do these errors come from?

High-quality data does not come easily – it requires lots of hard work and co-operation.
Writer
Senior Lead, The Digital Healt Hub, Sitra
Juhani Luoma-Kyyny is the information systems architect of the Isaacus project.

The Isaacus – Digital Health Hub currently under development will combine well-being databases and their users in a seamless and safe manner. This will allow the engagement of individuals, the promotion of well-being and the creation of a well-being and health ecosystem that is constantly learning. The Isaacus blog series gives experts the opportunity to shed light on the topic from their own perspective.

Data quality should be everyone’s concern

The early November weather lifted the spirits of all friends of winter and snow. That is, skiing fanatics, children and everyone who is in the winter tyre business. But weather, as we know, is predictable and after a few days the rain poured down and washed away the snow, bringing us the usual slushy November. But boy, did we have some action-filled moments!

This annual spectacle reminded me of the early days of our Isaacus project. Blogs announcing that we live in the land of milk and honey, sitting atop a treasure chest filled with high-quality research data, statistical materials and national registers. Hooray! Way to go! But only too soon those that know better were pondering whether the quality of the data was actually so good and if the treasure chest was actually filled with fool’s gold. Sigh. November slush, again. Probably we should not aim too high but concentrate on more down-to-earth things that we understand, like potato cultivation.

Then again, good discussions include various opinions and if we can reach a dialogue, we might even learn something. There might be a better way ahead after all! But no, the discussion quietened down. What should we think of this? Is silence golden or does silence imply consent? Argumentum ex silentio…

The plot is familiar, though. When data quality is concerned (and it is the case here) there are certain predictable phases. First, vivid discussions and strong opinions on how things should and could be. Then, loud voices recommending certain tools and even methods. But when it is time to actually do something, volunteers seem to disappear. With the exception of software sales people and consultants.

The operation of the Digital Health Hub will concentrate on data and information; data being the raw material and information being the processed outcome with added value. A treasure chest might be a slight exaggeration and does not describe the whole truth. Yes, there is data but it is located in various systems and in isolated environments. Data integration has certain issues even in the traditional enterprise environment but it seems to face unprecedented challenges within the Digital Health Hub. Data sources, for instance, are located outside the hub – physically and administratively.

Well, what’s the problem? Improving data quality is basically fixing an error when you find it. A piece of cake. And when the same error appears again, possibly in a different form, you fix it when you find it. Business as usual. But at some point this tedious repetition starts to get on your nerves – enough is enough. You grab the phone and call the data provider: please do something! But life repeats itself and the data provider ends up fixing the same errors again and again. Where do these errors come from? Now we stand in front of one of the base pillars of data quality – the identification of root causes, or the real reason for errors.

A good (or bad) example is an end-user interface for entering data. A mandatory field, but one may have no idea what to enter. What now? No worries, the EUSS (End User Secret Society) standard procedure is to enter three consecutive 1s and press “OK”! Solving problems like this requires more than technology – it requires changes in processes and human behaviour. And end users are not to blame; they just try to survive in a hectic environment, using poorly designed information systems.

Simply put, a data quality improvement process means that you recognise the impact of poor-quality data, identify the root cause, make corrections to existing data and prevent new errors. Child’s play! But as mentioned before, data quality improvement isn’t easy within a regular enterprise – not to mention the environment of the Digital Health Hub. It is obvious that the traditional methods are not adequate. We are probably able to recognise the impact of poor-quality data, possibly identify the root causes and even design the corrective actions, technologically. But if changes are required in existing operational systems, we might have a “mission impossible”. Not to mention re-designing processes.

What Now? Pray for a miracle? Or something completely different – how about some hard work and co-operation, just like in the Isaacus project.

#isaacus

What's this about?