Generating Spatial Synthetic Populations Using Wasserstein Generative Adversarial Network: A Case Study with EU-SILC Data for Helsinki and Thessaloniki
Vanja Falck
TL;DR
This study addresses the challenge of creating high-featured spatial synthetic populations for agent-based simulations by leveraging a Wasserstein Generative Adversarial Network (WGAN) trained on EU-SILC microdata from Finland and Greece to model Helsinki and Thessaloniki. It compares two balancing strategies—weight-imputation and WGAN-imputation—using diverse validation metrics, including $SRMSE$, Pearson's $r$, $R^2$, and Bland-Altman plots, to assess statistical, internal, and external validity. The results show that WGAN-based balancing can closely match targeted demographic profiles (e.g., for Helsinki) but may distort fringe groups, particularly for the self-perceived health variable $PH010$, highlighting discrimination risks and the need for careful balancing and validation. The findings underscore the potential of WGANs for producing rich synthetic populations while also calling attention to ethical and methodological challenges in representing vulnerable groups, suggesting future work on robust validity frameworks and advanced generative techniques. Overall, this work advances privacy-preserving, high-dimensional synthetic population generation for urban microsimulation, with practical implications for urban planning, health, and economic forecasting.
Abstract
Using agent-based social simulations can enhance our understanding of urban planning, public health, and economic forecasting. Realistic synthetic populations with numerous attributes strengthen these simulations. The Wasserstein Generative Adversarial Network, trained on census data like EU-SILC, can create robust synthetic populations. These methods, aided by external statistics or EU-SILC weights, generate spatial synthetic populations for agent-based models. The increased access to high-quality micro-data has sparked interest in synthetic populations, which preserve demographic profiles and analytical strength while ensuring privacy and preventing discrimination. This study uses national data from Finland and Greece for Helsinki and Thessaloniki to explore balanced spatial synthetic population generation. Results show challenges related to balancing data with or without aggregated statistics for the target population and the general under-representation of fringe profiles by deep generative methods. The latter can lead to discrimination in agent-based simulations.
