Table of Contents
Fetching ...

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Mirko Degli Esposti

Abstract

Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of existing approaches is exact expectation computation, which requires summing over the full tuple space $\cX$ and becomes infeasible for more than $K \approx 20$ categorical attributes. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of $N$ synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising $\cX$. We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a $K{=}15$ Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across $K \in \{12, 20, 30, 40, 50\}$ confirm that GibbsPCDSolver maintains $\MRE \in [0.010, 0.018]$ while $|\cX|$ grows eighteen orders of magnitude, with runtime scaling as $O(K)$ rather than $O(|\cX|)$. On Syn-ISTAT, GibbsPCDSolver reaches $\MRE{=}0.03$ on training constraints and -- crucially -- produces populations with effective sample size $\Neff = N$ versus $\Neff \approx 0.012\,N$ for generalised raking, an $86.8{\times}$ diversity advantage that is essential for agent-based urban simulations.

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Abstract

Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of existing approaches is exact expectation computation, which requires summing over the full tuple space and becomes infeasible for more than categorical attributes. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising . We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across confirm that GibbsPCDSolver maintains while grows eighteen orders of magnitude, with runtime scaling as rather than . On Syn-ISTAT, GibbsPCDSolver reaches on training constraints and -- crucially -- produces populations with effective sample size versus for generalised raking, an diversity advantage that is essential for agent-based urban simulations.

Paper Structure

This paper contains 71 sections, 1 theorem, 15 equations, 6 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1

Under $p_{\bm{\lambda}}$, the conditional distribution of $A_k$ given the remaining attributes $\mathbf{x}_{-k}$ is where $\mathcal{J}(k, v, \mathbf{x}_{-k})$ denotes the set of constraints $j$ such that $k \in S_j$, $v^{(k)}_j = v$, and $\mathbf{x}_{S_j \setminus \{k\}} = \mathbf{v}^{(-k)}_j$, i.e. all remaining attributes in the constraint are consistent with $\mathbf{x}_{-k}$.

Figures (6)

  • Figure 1: Experiment A0 ($K{=}6$, $|\mathcal{X}|{=}216$). Left: MRE convergence curves for four pool sizes (log scale); dashed line is exact L-BFGS reference. Centre: scatter of $\hat{\bm{\lambda}}$ vs. $\bm{\lambda}^*$ at $N{=}10{,}000$; the systematic offset from the diagonal reflects a gauge degree of freedom in $\bm{\lambda}$ (see footnote), not solver error. Right: estimated frequencies $\hat{\alpha}_j$ vs. targets $\alpha_j$ at $N{=}10{,}000$; points lie close to the diagonal across the full range $[0.004,\, 0.906]$.
  • Figure 2: Experiment A1a ($K{=}8$, $|\mathcal{X}|{=}6{,}144$, $m{=}61$, $s{=}5$). Left: MRE convergence curves for four pool sizes (log scale); dashed line is the exact L-BFGS reference ($\mathrm{MRE}{=}4{\times}10^{-5}$). Centre: scatter of $\bm{\lambda}_{\mathrm{MCMC}}$ vs. $\bm{\lambda}^*$ at $N{=}50{,}000$; the constant offset from the diagonal is the gauge degree of freedom induced by unary constraints and does not affect the distribution $p_{\bm{\lambda}}$. Right: final MRE vs. $N$ (log--log); the dashed reference line follows $1/\sqrt{N}$, confirming the theoretical variance floor.
  • Figure 3: Experiment A1b ($K{=}10$, $|\mathcal{X}|{=}59{,}049$, $m{=}30$ binary constraints, $s{=}5$). Left: KL divergence $\mathrm{KL}(p_{\bm{\lambda}^*}\|p_{\bm{\lambda}_{\mathrm{MCMC}}})$ (red, left axis) and relative parameter distance $\|\Delta\bm{\lambda}\|/\|\bm{\lambda}^*\|$ (blue, right axis) vs. pool size $N$ (log--log); the dotted line shows the $O(1/N)$ reference. Centre: scatter of $\bm{\lambda}_{\mathrm{MCMC}}$ vs. $\bm{\lambda}^*$ at $N{=}50{,}000$; unlike A0--A1a, no gauge offset is present and points cluster tightly around the diagonal ($\|\Delta\bm{\lambda}\|/\|\bm{\lambda}^*\|{=}0.016$). Right: estimated constraint frequencies $\hat{\alpha}_j$ vs. exact targets $\alpha^*_j$ at $N{=}50{,}000$, for binary constraints (blue).
  • Figure 4: A2 scaling experiments ($N{=}100{,}000$, arity=3). Left: MRE of GibbsPCDSolver (solid blue) vs. exact MaxEnt of pachet2026 (dashed blue, their Table 2). The two curves use different problem instances (WuGenerator vs. NPORS); the comparison is qualitative---both achieve MRE in the $1$--$2\%$ range---and is not a claim that GibbsPCDSolver outperforms exact enumeration. At $K{=}30$ only GibbsPCDSolver is available ($|\mathcal{X}| \approx 5{\times}10^{11}$). Centre: raking $N_{\mathrm{eff}}/N$ (red diamonds, log scale, left axis) and Shannon entropy $H$ (dotted lines, right axis) for both methods. Right: fit time vs. $K$ (log scale).
  • Figure 5: Population diversity: GibbsPCDSolver vs. generalised raking ($N{=}100{,}000$, Syn-ISTAT, $K{=}15$, full training on 31 constraints). Left: raking weight distribution (log scale); the dashed line marks the uniform weight $1/N$ of GibbsPCDSolver. After convergence, raking concentrates almost all mass on $N_{\rm eff}{=}1{,}152$ effective individuals ($1.2\%$ of $N$). Centre: Lorenz curve of raking weights (Gini${=}0.951$); the diagonal is perfect equality. The shaded area between the two curves quantifies the weight concentration. Right: normalised diversity metrics for both solvers; ratios above each pair indicate the GibbsPCD advantage. GibbsPCDSolver maintains uniform pool weights by construction ($N_{\rm eff}{=}N$), achieving an $86.8{\times}$ diversity advantage in effective sample size and $3.15$ nats higher Shannon entropy than raking---essential for realistic agent-based dynamics.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1: Gibbs conditionals
  • proof