Exact Synthetic Populations for Scalable Societal and Market Modeling
Thierry Petit, Arnault Pachot
TL;DR
The paper presents a constraint programming framework to generate exact synthetic populations that match target distributions while guaranteeing individual-level coherence without microdata. It uses a batched solving approach with distribution constraints, optional diversity constraints, and support for interdependent features, enabling scalable generation for applications like virtual polling, territorial intelligence, and AI-driven content evaluation. The authors demonstrate near-exact distribution matches on real statistics, explore trade-offs between constraint density and accuracy, and highlight practical uses through a Pollitics platform that couples CP-generated agents with large language models. The work emphasizes data privacy, reproducibility, and data-validation capabilities, proposing a principled path to simulate complex populations for policy, market, and communications analytics.
Abstract
We introduce a constraint-programming framework for generating synthetic populations that reproduce target statistics with high precision while enforcing full individual consistency. Unlike data-driven approaches that infer distributions from samples, our method directly encodes aggregated statistics and structural relations, enabling exact control of demographic profiles without requiring any microdata. We validate the approach on official demographic sources and study the impact of distributional deviations on downstream analyses. This work is conducted within the Pollitics project developed by Emotia, where synthetic populations can be queried through large language models to model societal behaviors, explore market and policy scenarios, and provide reproducible decision-grade insights without personal data.
