Mapineq Link: Real-world data for research

By Mijail Figueroa González

Mapineq Link Phase 3 reveals the power of innovative geo-located socioeconomic indicators from commercial and unconventional sources to untangle new ways in which social inequalities are produced.

Geo-located indicators from real-world data will be available programmatically via an API. They will be explored through a user-friendly interactive dashboard, set to launch in October 2024.

What is real-world data?

Real-world data refers to data coming from social media (e.g., Facebook, Instagram, TikTok, Twitter/X), professional networking (e.g., LinkedIn) and job sites (e.g., Monster.com), media services (e.g., Spotify, Netflix, mobile phones), internet searches or watches (e.g., Google, YouTube), blogs or discussion forums (e.g., Reddit), wearables or body monitors and Apps that measure bodily functions (e.g., Fitbit, Period tracker, Diet), financial or loyalty cards (e.g., Transactions, Supermarket cards), internet of things (IoT) (e.g., footfall, air quality sensors) to housing (e.g., sales/rental transactions) and beyond (see image below).

Real-world data, in comparison with classical survey or register data, generally offers more diverse, rapid, and high-dimensional temporal and spatial granularity for real- or near-time estimations. In addition to timely information on behaviour, real-world data often leaves traces that can be analysed such as location or time stamps, that can be leveraged or linked for scientific research in the public interest.

Classic and real-world data

What can be done with real-world data?

In her latest report of the Mapineq Link database, Professor Melinda Mills provides an overview of research applications of real-world data relevant for our understanding of social inequalities.

In particular, she offers insights into eleven cases, ranging from estimating populations in remote or hard-to-reach areas using satellite images to accurately understanding and predicting human migration and mobility in real-time, leveraging Facebook advertising data.

Limitations, challenges and ethics of real-world data

Professor Mills highlights the main limitations of real-world data, which include the lack of representativeness due to the selectivity of groups and geographies, and the lack of basic demographic indicators of the subjects.

Accessing real-world data for public interest research presents challenges, starting for the prohibitive prices for university-based research, and the fact that non-commercial research for the public interest is often on the basis of protection of individual rights. Web scraping emerges as an alternative for accessing such real-world data; however, it has raised legal and ethical concerns regarding the data retrieved from such methods.

Professor Mills, concludes her report discussing the legal and ethical dimensions of web scraping, the ethics of using data derived from security breaches, risks of re-identification and harm, and misuse versus ‘missed use’ of real-world data.

The report Mapineq Link: Leveraging real-world data for research, currently under embargo, is the third of four modules of the geo-linked inequality database and will be openly accessible in autumn 2024.

Do you want to learn more?

Explore our previous releases: the Phase 1 geospatial social and economic policy database and the Phase 2 physical environmental geo-linked indicators.

Previously in news: Mapineq Link: two significant steps closer to realisation.