Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Mirko Degli Esposti

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Mirko Degli Esposti

Abstract

Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of existing approaches is exact expectation computation, which requires summing over the full tuple space $\cX$ and becomes infeasible for more than $K \approx 20$ categorical attributes. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of $N$ synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising $\cX$. We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a $K{=}15$ Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across $K \in \{12, 20, 30, 40, 50\}$ confirm that GibbsPCDSolver maintains $\MRE \in [0.010, 0.018]$ while $|\cX|$ grows eighteen orders of magnitude, with runtime scaling as $O(K)$ rather than $O(|\cX|)$. On Syn-ISTAT, GibbsPCDSolver reaches $\MRE{=}0.03$ on training constraints and -- crucially -- produces populations with effective sample size $\Neff = N$ versus $\Neff \approx 0.012\,N$ for generalised raking, an $86.8{\times}$ diversity advantage that is essential for agent-based urban simulations.

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Abstract

and becomes infeasible for more than

categorical attributes. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of

synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising

. We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a

Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across

confirm that GibbsPCDSolver maintains

while

grows eighteen orders of magnitude, with runtime scaling as

rather than

. On Syn-ISTAT, GibbsPCDSolver reaches

on training constraints and -- crucially -- produces populations with effective sample size

versus

for generalised raking, an

diversity advantage that is essential for agent-based urban simulations.

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Abstract

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)