Skip to main content
Government uses Dirty Data

blog | 4 min read

Is Dirty Data used by the Government costing the NHS more than £1Billion?

After hearing a report by the BBC Radio 4 statistics show, ‘More Or Less’, we became suspicious of official Covid vaccination data being used by the government. It made us question whether the government has been using unclean, unreliable data to estimate Covid vaccination statistics.

Some of the statistics released last year made assumptions regarding the number of unvaccinated people. However, the ‘More Or Less’ programme revealed that the number of unvaccinated people in England is actually unknown, so any subsequent statistics cannot be relied upon.

It should be easy to work out the number of unvaccinated people in a population. Records show how many people we have vaccinated, so it is a simple matter of subtracting that number from the total population.

Except, we don’t know the exact population at the moment. The details of the 2021 census are not available at the time of writing (January 2022). However, there is a population estimate available from the Office of National Statistics (ONS) – again this is not perfect, as it is based on an estimate of the population over a year ago but it is the best estimate available we believe.

For some reason, Public Health England (a government department that no longer exists) didn’t use this estimate to work out the unvaccinated number of people.

Instead, they used the number of people registered with doctors and pulled that into a database called the National Immunisation Management System (NIMS).

It turns out that using the number of people registered with a doctor is a spectacularly inaccurate way to estimate the population. For example, some people are registered with more than one doctor (students are often registered twice), or they may be registered with a doctor but have left the country.

How inaccurate is this estimation of the population? There are actually 6 million people more in the NIMS database than the ONS population estimate.

In the government’s own Transparency and Data Vaccines Report, issues in November 2021, they accept that the data is likely to contain errors, but believe it is better to use this data than the ONS data.

Clearly something needs to be done to the NIMS database before it can be used effectively.

We were not confident that sufficient work had been done to clean the database and issued a freedom of information request asking how often the NIMS database was cleaned. It turns out that the NIMS database is actually a copy of another database called the NHS Digital Personal Demographic Service (PDS). So we asked how often the PDS database was cleaned. They wanted a definition of ‘cleaned’ and also, somewhat surprisingly, asked what we meant by Suppression Files.

Once we supplied this basic information, NHS Digital responded by stating that the PDS was cleaned against notification of deaths from various sources. They also said that “where duplicate records are identified they are investigated” and in the last two years they have removed 85,729 records.

Given that the discrepancy between the NIMS/PDS database and the ONS population is 6 million, that still leaves more than a potential 5.9 million excess records.

So what is the cost to the nation by over-estimating the population by 5.9 million?

  • Are we over-buying vaccines based on this figure?
  • Are we sending duplicate vaccine invitation letters out?
  • Based on 30p per letter this could be costing an additional £1.7 million pounds per mailing, in excess postage alone.
  • Are we spending large amounts on digital and SMS marketing messages to this non-existent 5.9 million?

Before embarking on any major use of a database, we always advise that steps are taken to ensure it is clean, current and fit for purpose. We would recommend that organisations do this before carrying out a marketing campaign or a corporate communication exercise. Not only to ensure that any analysis is correct, but also to SAVE MONEY.

In conclusion, we can only assume that the various government departments are not supporting the NHS with good quality data. The government knows that the data is not good quality, but it is still using it to generate some very important statistics. Why not just clean the data? We believe this is serious, because they are dealing with information that could be used to save lives.

How important is it to use clean data? It’s this important.

If you are worried about the quality of your data, try our Free Data Quality Check. It provides a quick, simple and no-strings-attached way to test the quality of your data.

, updated 22nd February 2022.