How policy and science can work towards a rich data landscape in Europe

Even when respecting legitimate privacy concerns, the accessibility and use of register data for research purposes could be fundamentally improved in Europe.

Even when respecting legitimate privacy concerns, the accessibility and use of register data for research purposes could be fundamentally improved in Europe. This would not only help European research to stay competitive, but also improve science-informed policy planning.

In research, there is a wide interest in expanding the use of register data. This is already happening in Europe, but challenges need to be overcome. Many of them are related to legislation, privacy and ethics. At the moment, concrete discussions and a gradual movement towards harmonised regulations at the European level are taking place, for example on harmonised rules on fair access and use of data. However, there are still many differences in how countries implement the few existing EU data regulations. This is likely to continue for many years to come.

Smooth collaboration and building trust between scientists and policymakers are critical elements in this regard. As these EU regulations need to be implemented in different countries, there should be more input from the research communities. It is often the case that issues that are considered to be very minor problems or non-issues from a research perspective end up dominating legislative discussions. Similarly, the challenges of open access to data are not just about the legal framework – they are also about technical issues that need to be solved in practice while doing research.

Almost everywhere in academia, many steps forward are being taken. The changes are taking place in three stages: First, the research use of interlinked registers is expanding. This is an ongoing process, with some countries just starting out and others with decades of experience. Second, register data is being supplemented by other sources, such as contextual data, surveys, or unstructured data from, for example, text or image-format data sources. This is something that has just begun, and has an enormous potential for scientific breakthroughs.

The third stage has not yet been reached anywhere, but is already on the horizon: the synthetic combination of registers and other data sources. This can dramatically improve the coverage of different types of data that are otherwise missing. More importantly, it can be used to create fabricated but statistically identical copies of real data, avoiding the privacy issues that still exist in the extensive combinations of original data sources.

What is needed to accelerate register-based research? Sharing original data is difficult, especially when it comes to individual-level information, mostly because of legitimate privacy concerns. But many other valuable elements could be shared more widely: for example, good data processing and storage practices, coding, and analytical and methodological perspectives.

Finally, it is important to recognise that knowledge and skills are needed to develop register-based research. This is partly due to poor data documentation, which makes collaboration, comparison and interpretation difficult. But we also need to learn the necessary skills: neither the methods nor the datasets themselves are easy to use, and they need to be taught to new generations of researchers. This is not being done properly anywhere at the moment and is an extremely important area for rapid improvement.

This article is based on this post by Jani Erola, first published on Population Europe.