The Global Representativeness Index: A Total Variation Distance Framework for Measuring Demographic Fidelity in Survey Research
Evan Hadfield
TL;DR
The paper addresses the lack of a standardized metric for demographic representativeness in global surveys by introducing the Global Representativeness Index (GRI), a symmetric, TVD-based measure that quantifies how closely a sample’s joint demographic distribution matches global population benchmarks. It extends the framework with a Diversity Score, the Strategic Representativeness Index (SRI), and a multi-dimensional scorecard, and it analyzes the inferential cost of misrepresentation via design effects and effective sample size. Through empirical validation on the Global Dialogues survey and cross-surveys like the World Values Survey, Afrobarometer, and Latinobarómetro, the work demonstrates that large samples can still be far from globally representative, while broader country coverage can enhance GRI but may reduce within-country power. The authors provide an open-source gri Python library with UN and Pew benchmarks, discuss normative choices, and show how reporting GRI alongside effective sample size yields a fuller picture of survey quality, enabling better design, evaluation, and accountability in AI governance and ML dataset auditing. The framework thus offers practical, interpretable metrics to quantify and improve demographic fidelity in global data collection and AI evaluation tasks.
Abstract
Global survey research increasingly informs high-stakes decisions in AI governance and cross-cultural policy, yet no standardized metric quantifies how well a sample's demographic composition matches its target population. Response rates and demographic quotas -- the prevailing proxies for sample quality -- measure effort and coverage but not distributional fidelity. This paper introduces the Global Representativeness Index (GRI), a framework grounded in Total Variation Distance that scores any survey sample against population benchmarks across multiple demographic dimensions on a [0, 1] scale. Validation on seven waves of the Global Dialogues survey (N = 7,500 across 60+ countries) finds fine-grained demographic GRI scores of only 0.33--0.36 -- roughly 43% of the theoretical maximum at that sample size. Cross-validation on the World Values Survey (seven waves, N = 403,000), Afrobarometer Round 9 (N = 53,000), and Latinobarometro (N = 19,000) reveals that even large probability surveys score below 0.22 on fine-grained global demographics when country coverage is limited. The GRI connects to classical survey statistics through the design effect; both metrics are recommended as a minimum summary of sample quality, since GRI quantifies demographic distance symmetrically while effective N captures the asymmetric inferential cost of underrepresentation. The framework is released as an open-source Python library with UN and Pew Research Center population benchmarks, applicable to survey research, machine learning dataset auditing, and AI evaluation benchmarks.
