The Census is not completely fixable

Statistics NZ has finally announced that we will get the first tranche of Census 2018 data on September 23 – a mere 566 days after we filled in the forms. But concerns remain with some indicators. Which raises the big questions – what data will be released, and can we trust this census?

What went wrong?

A low census response rate left a patchy dataset, which Statistics NZ has been scrambling to fill using fancy modelling and administrative data.

The administrative data used is the Integrated Data Infrastructure (IDI) – a dataset that links together several government sources into a centralised database about people and households.

The importance of this administrative data for repairing the Census can’t be understated.

Statistics NZ used IDI to engineer 11% of the people counted in the ultimate census dataset, which equates to 576,000 people. The remaining 89% of individuals identified during Census 2018 came from individual forms (85%) and from household forms (4%).

How is it being fixed?

This splicing together of census forms and administrative data to count the population might sound a bit like alchemy to some.

I guess it kind of is, but even so, my conversations with Statistics NZ and reviews of their methodology have convinced me that the techniques employed to count people have been sound.

The administrative population count starts with a spine of data from births, visas, and tax records. People that have died or left the country are then weeded out using death records and border movements, while other datasets such as tax, health, benefits, education and ACC records help inform activity.

This administrative population count was then matched against what the Census had captured to fill in gaps in the overall population count. The result was a comprehensive snapshot compared to a field-gathered census alone.

What characteristics are measurable using administrative data?

Statistics NZ’s hybrid technique is fantastic for counting people, but it begins to reach limitations as you try and learn about the characteristics of those individual people.

The extent of those limitations is ultimately determined by the sorts of personal details consistently captured by government departments. As you can imagine, administrative data performs well for core demographic details such as age and sex, given those details are captured on most government forms.

The data can also help with understanding ethnicity, but only to a high level. Government forms generally only ask for ethnicity across categories such as Asian, African, European, which isn’t conducive to understanding things like the number of people of Thai compared to Filipino descent.

Administrative data is helpful for pinning down the location of where people live. People can also be grouped together in households using the data, but there can be challenges sorting out households that include younger people. For example, I left Invercargill at 18, but didn’t change most of my addresses with government departments from the family abode to where I really live until I hit 30.

Low hanging fruit to be released first

The first tranche of Census 2018 data is scheduled to be released on September 23.

This release will be centred around the easy-to-fix indicators described above and will include:

  • Population counts for usually resident and on census night
  • Electorate populations for general and Māori electorates
  • Dwelling counts

These indicators are Tier 1 statistics and are being rushed out first to fulfil statutory requirements to support the setting of electorate boundaries ahead of the 2020 General Election.

After that things get dicey

Beyond Statistics NZ’s initial data release, the details begin to get unclear. To say there is an information vacuum is putting things lightly.

Even Statistics NZ does not know at this stage which indicators they will be able to ultimately release and when.

All Statistics NZ has been able to say is they will finally announce a release schedule by the end of July.

I strongly suspect that the ultimate list of indicators in that release schedule will disappoint.

The problem is that administrative data has its limitations for filling in very specific characteristic details about people.

We saw that age, sex, and location are easy as they are captured in most forms.

Other datasets, like tax records, also give Statistics NZ a treasure trove of insight by clarifying what people earn and what industry they work in.

But there are some nitty gritty details, that would have only been possible to find out through the carefully-worded interrogation of the census form.

Administrative data is of little use for understanding:

  • How people use their time
  • Do people volunteer?
  • Is their home damp?
  • How many rooms are in their home?
  • How do they travel to work?
  • Do they smoke?
  • What languages are spoken?
  • Specifics of Iwi affiliations
  • Do they have disabilities?

Insights regarding the topics above will be challenging for Statistics NZ to estimate. As a result, some data may be withheld. And in cases where data is released on those topics, it may be abridged to a higher level, or be accompanied with carefully-scripted caveats surrounding estimation errors.

Beyond those obvious topics of concern, I also remain sceptical regarding skills and occupation data. Statistics NZ seems pretty sure they can still get good insights on qualifications and occupations from administrative data and past censuses. But I am not so sure.

Administrative data only captures qualifications from those that were educated in New Zealand so will miss the qualifications of the migrant population.

Occupations, on the other hand, might be adequately informed by past censuses for jobs that are consistent through time. But what about jobs that didn’t exist five years ago, like social media advisers? Surely statistical imputation from old data is also not going to help us understand whether the future of work is already showing up in the data among some specific occupations.

What can you do about it?

At this stage, the only given is that you must wait until the end of July to find out the final suite of indicators that will be released by Statistics NZ.

But in the meantime, you should also prepare yourself to be disappointed.

Although Statistics NZ is likely to set a very high bar before it completely withholds an indicator, any abridging of data to a higher level will have huge effects on monitoring what matters for people in the regions.

Many regions have been relying on the Census to help flesh out their understanding of the wellbeing of their residents. And this understanding relies on specific detail.

The good news is that the Census need not be the only show in town when it comes to understanding wellbeing in your region.

Other options do exist, they just require a little more creative thinking to get your hands on the insight. For example, you can fill in gaps with proxies and other government datasets, as well as local surveys.

Over the coming months I look forward to working with businesses, local authorities and development agencies as they map out a way forward.