Table of Contents
Fetching ...

Generating Origin-Destination Matrices in Neural Spatial Interaction Models

Ioannis Zachos, Mark Girolami, Theodoros Damoulas

TL;DR

Addresses reconstructing a discrete $I\times J$ origin-destination matrix $T_{ij}$ from partial statistics for high-resolution ABMs, avoiding discretisation errors from continuous relaxations. Proposes GeNSIT, a neural-physics framework that jointly calibrates a continuous SIM via a neural differential equation and samples the discrete ODM space constrained by ${\mathcal{C}}$, with $O(IJ)$-scaling. Contributions include a joint/disjoint sampling framework with Markov Basis MCMC for discrete tables, neural calibration of SIM parameters $\boldsymbol{\theta}=(\alpha,\beta)$ driving $\Lambda_{ij}$ via Harris-Wilson dynamics, and validation on Cambridge ($I=69$, $J=13$) and Washington, DC ($I=J=179$) showing improved SRMSE and 99% CP at substantially lower compute. This framework extends to other contingency-table inference problems and integrates physics-based dynamics with neural calibration, while discussing limitations and social impacts.

Abstract

Agent-based models (ABMs) are proliferating as decision-making tools across policy areas in transportation, economics, and epidemiology. In these models, a central object of interest is the discrete origin-destination matrix which captures spatial interactions and agent trip counts between locations. Existing approaches resort to continuous approximations of this matrix and subsequent ad-hoc discretisations in order to perform ABM simulation and calibration. This impedes conditioning on partially observed summary statistics, fails to explore the multimodal matrix distribution over a discrete combinatorial support, and incurs discretisation errors. To address these challenges, we introduce a computationally efficient framework that scales linearly with the number of origin-destination pairs, operates directly on the discrete combinatorial space, and learns the agents' trip intensity through a neural differential equation that embeds spatial interactions. Our approach outperforms the prior art in terms of reconstruction error and ground truth matrix coverage, at a fraction of the computational cost. We demonstrate these benefits in large-scale spatial mobility ABMs in Cambridge, UK and Washington, DC, USA.

Generating Origin-Destination Matrices in Neural Spatial Interaction Models

TL;DR

Addresses reconstructing a discrete origin-destination matrix from partial statistics for high-resolution ABMs, avoiding discretisation errors from continuous relaxations. Proposes GeNSIT, a neural-physics framework that jointly calibrates a continuous SIM via a neural differential equation and samples the discrete ODM space constrained by , with -scaling. Contributions include a joint/disjoint sampling framework with Markov Basis MCMC for discrete tables, neural calibration of SIM parameters driving via Harris-Wilson dynamics, and validation on Cambridge (, ) and Washington, DC () showing improved SRMSE and 99% CP at substantially lower compute. This framework extends to other contingency-table inference problems and integrates physics-based dynamics with neural calibration, while discussing limitations and social impacts.

Abstract

Agent-based models (ABMs) are proliferating as decision-making tools across policy areas in transportation, economics, and epidemiology. In these models, a central object of interest is the discrete origin-destination matrix which captures spatial interactions and agent trip counts between locations. Existing approaches resort to continuous approximations of this matrix and subsequent ad-hoc discretisations in order to perform ABM simulation and calibration. This impedes conditioning on partially observed summary statistics, fails to explore the multimodal matrix distribution over a discrete combinatorial support, and incurs discretisation errors. To address these challenges, we introduce a computationally efficient framework that scales linearly with the number of origin-destination pairs, operates directly on the discrete combinatorial space, and learns the agents' trip intensity through a neural differential equation that embeds spatial interactions. Our approach outperforms the prior art in terms of reconstruction error and ground truth matrix coverage, at a fraction of the computational cost. We demonstrate these benefits in large-scale spatial mobility ABMs in Cambridge, UK and Washington, DC, USA.

Paper Structure

This paper contains 34 sections, 1 theorem, 20 equations, 10 figures, 7 tables, 1 algorithm.

Key Result

Proposition 3.1

(Adapted from diaconis1998): Let $\mu$ be a probability measure on $\mathcal{T}_{{\textcolor{#E20000}{\mathcal{C}}}}$. Given a Markov basis $\mathcal{M}$ that satisfies def:markov_basis, generate a Markov chain in $\mathcal{T}_{{\textcolor{#E20000}{\mathcal{C}}}}$ by sampling $l$ uniformly at rando and move to $\mathbf{T}' = \mathbf{T} + \eta \mathbf{f}_l$ for the choice of $\eta$. An aperiodic,

Figures (10)

  • Figure 1: The ground truth discrete ODM (two-way contingency table) can be reconstructed through either multiple expensive ABM simulations anirudh2022 (ABM rectangle) or approximated by a continuous representation $\boldsymbol{\Lambda}$ coupled with the Harris-Wilson SDE (SIM rectangle). In the latter, the ground truth can be reconstructed by sampling in the discrete combinatorial space of constrained ODMs conditioned on $\boldsymbol{\Lambda}$ (GeNSIT rectangle). ABM simulations scale with $\mathcal{O}(M\log(M))$ compared to $\textsc{GeNSIT}$ which scales with $\mathcal{O}(IJ)$, where $M\gg I+J$ is the size of the agent interaction graph.
  • Figure 2: The space $\mathcal{T}_{{\textcolor{#E20000}{\mathcal{C}}}}$ of $3\times3$ discrete ODMs with summary statistics ${\textcolor{#E20000}{\mathcal{C}_{T}}}$. Sampling on the continuous relaxation of $\mathcal{T}_{{\textcolor{#E20000}{\mathcal{C}}}}$ ($\boldsymbol{\Lambda}$ level) with quantisation can lead to either large rejection rates, or poor exploration of the distribution over $\mathcal{T}_{{\textcolor{#E20000}{\mathcal{C}}}}$.
  • Figure 3: GeNSIT : (a) successive iterations of Alg. \ref{['alg:neural_sde_table_inference']} for a given ensemble member, (b) plate diagram for every iteration, ensemble member. We propose two schemes: a Joint and a Disjoint (see App. \ref{['app:disjoint_vs_joint']} for details). Contrary to the latter, the former passes table $\mathbf{T}$ information to the loss $\mathcal{L}$ (see in (b)). We perform an optimisation step in the intensity $\boldsymbol{\Lambda}$ space and a sampling step in $\mathbf{T}$ space, with associated complexities $\mathcal{O}(\tau J+IJ)$ and $\mathcal{O}(IJ)$. The $\boldsymbol{\Lambda}$ arises by the well-known family of SIMs \ref{['eq:totally_constrained_sim']},\ref{['eq:singly_constrained_sim']} coupled with the HW-SDE \ref{['eq:harris_wilson_sde']}. The $\mathbf{T}$ sampling step generates discrete ${\textcolor{#E20000}{\mathcal{C}_{T}}}$-constrained ODMs contrary to ellam2018gaskin2023, which only operate on the continuous mean-field level $\boldsymbol{\Lambda}$.
  • Figure 4: Total computation time (left) of GeNSIT and SIM-NNgaskin2023 versus the number of origin-destination pairs $IJ$ with $(I\times J)$ equal to $100\times100,200\times200,\dots,1000\times1000$. The two algorithms are run for $N=10^3$ iterations with ${\textcolor{#E20000}{\mathcal{C}_{T}}}=\{{\textcolor{#E20000}{\mathbf{T}_{\cdot+}}},{\textcolor{#E20000}{\mathbf{T}_{+\cdot}}},{\textcolor{#E20000}{\mathbf{T}_{\mathcal{X}_{50\%}}}}\}$. Constraint ${\textcolor{#E20000}{\mathbf{T}_{\mathcal{X}_{50\%}}}}$ means that 50% of table cells chosen uniformly at random are fixed. Total computation time is the sum of the $\mathbf{T}$ sampling (middle) and $\boldsymbol{\Lambda}$ learning (right) times. Intensity learning and table sampling times are computed for lines \ref{['alg:line:intensity_start']}-\ref{['alg:line:intensity_end']} and lines \ref{['alg:line:table_start']}-\ref{['alg:line:table_end']} of Alg. \ref{['alg:neural_sde_table_inference']}, respectively. Our framework scales linearly with $IJ$, which is much faster than SIM-MCMC and SIT-MCMC. SIM-NN does not operate at all in the discrete table space, which explains its faster computational speed.
  • Figure 5: SRMSE by total number of agents $A$ (left) and ODM dimension $(I\times J)$ (right) of GeNSIT's discrete ODM sampling (line \ref{['alg:line:table_end']} of Alg. \ref{['alg:neural_sde_table_inference']}) for $N=10^4$ iterations, a fixed intensity $\boldsymbol{\Lambda}$ and ${\textcolor{#E20000}{\mathcal{C}_{T}}}=\{{\textcolor{#E20000}{\mathbf{T}_{\cdot+}}},{\textcolor{#E20000}{\mathbf{T}_{\mathcal{X}_{50\%}}}}\}$. On the left we set $(I\times J)=150\times150$. The reconstruction error (SRMSE) scales linearly in $IJ$ and exponentially in $A={\textcolor{#E20000}{T_{++}}}$.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Definition 3.1
  • Definition 3.2
  • Proposition 3.1