Table of Contents
Fetching ...

Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks

Francesco Cozzi, Marco Pangallo, Alan Perotti, André Panisson, Corrado Monti

TL;DR

The paper tackles the challenge of aligning agent-based models (ABMs) with real data by learning a differentiable surrogate that preserves micro-level, decentralized dynamics. It introduces Graph Diffusion Network (GDN), which fuses a graph neural network for local interactions with a conditional diffusion model to learn per-agent transition distributions from ABM data. The approach enables gradient-based calibration and direct estimation of agent states, demonstrated on Schelling and Predator-Prey to reproduce both micro transitions and emergent macro behavior beyond training. This data-driven framework paves the way for flexible, interpretable ABM surrogates with potential applications across economics, epidemiology, urban science, and ecology.

Abstract

Agent-Based Models (ABMs) are powerful tools for studying emergent properties in complex systems. In ABMs, agent behaviors are governed by local interactions and stochastic rules. However, these rules are, in general, non-differentiable, limiting the use of gradient-based methods for optimization, and thus integration with real-world data. We propose a novel framework to learn a differentiable surrogate of any ABM by observing its generated data. Our method combines diffusion models to capture behavioral stochasticity and graph neural networks to model agent interactions. Distinct from prior surrogate approaches, our method introduces a fundamental shift: rather than approximating system-level outputs, it models individual agent behavior directly, preserving the decentralized, bottom-up dynamics that define ABMs. We validate our approach on two ABMs (Schelling's segregation model and a Predator-Prey ecosystem) showing that it replicates individual-level patterns and accurately forecasts emergent dynamics beyond training. Our results demonstrate the potential of combining diffusion models and graph learning for data-driven ABM simulation.

Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks

TL;DR

The paper tackles the challenge of aligning agent-based models (ABMs) with real data by learning a differentiable surrogate that preserves micro-level, decentralized dynamics. It introduces Graph Diffusion Network (GDN), which fuses a graph neural network for local interactions with a conditional diffusion model to learn per-agent transition distributions from ABM data. The approach enables gradient-based calibration and direct estimation of agent states, demonstrated on Schelling and Predator-Prey to reproduce both micro transitions and emergent macro behavior beyond training. This data-driven framework paves the way for flexible, interpretable ABM surrogates with potential applications across economics, epidemiology, urban science, and ecology.

Abstract

Agent-Based Models (ABMs) are powerful tools for studying emergent properties in complex systems. In ABMs, agent behaviors are governed by local interactions and stochastic rules. However, these rules are, in general, non-differentiable, limiting the use of gradient-based methods for optimization, and thus integration with real-world data. We propose a novel framework to learn a differentiable surrogate of any ABM by observing its generated data. Our method combines diffusion models to capture behavioral stochasticity and graph neural networks to model agent interactions. Distinct from prior surrogate approaches, our method introduces a fundamental shift: rather than approximating system-level outputs, it models individual agent behavior directly, preserving the decentralized, bottom-up dynamics that define ABMs. We validate our approach on two ABMs (Schelling's segregation model and a Predator-Prey ecosystem) showing that it replicates individual-level patterns and accurately forecasts emergent dynamics beyond training. Our results demonstrate the potential of combining diffusion models and graph learning for data-driven ABM simulation.

Paper Structure

This paper contains 37 sections, 15 equations, 20 figures, 9 tables, 2 algorithms.

Figures (20)

  • Figure 1: Overview of the training and generation pipeline for differentiable surrogates of Agent-Based Models. The top-left panel illustrates the training process: we run simulations using the original ABM, and use the resulting trajectories to train the differentiable surrogate. The top-right panel shows the structure of the ABM-generated data using the Predator-Prey model as an example: a state at time step $t{-}1$ gives rise to multiple possible states at time $t$, one of which is chosen to generate further possible states at $t{+}1$. Colored cells highlight the behavior of a specific "prey" agent — green for "move," red for "die," and pink for "reproduce." The bottom panel shows the generation phase: given a new observed state, the trained surrogate simulates plausible future states.
  • Figure 2: Evolution of the position of black and red agents in the Schelling model, for three simulation runs, one for each of the considered tolerance thresholds $\xi_1=0.625$, $\xi_2=0.75$, $\xi_3=0.875$ (left, center, and right panel). We compare the ground truth (top row) with our surrogate (middle row) and with the best-performing ablation (according to sMAPE, bottom row). For each panel and model, we show three time steps: $t=0$ (initial conditions, same for each column but kept for clarity), $t=15$, and $t=30$.
  • Figure 3: Forecasting macro-level summary statistics (here, the number of alive preys and predators over time), starting from the last condition seen in training, for 100 independent simulation runs, under configuration $\Psi_1$ (oscillations for both predators and preys, top) and $\Psi_4$ (oscillations only for predators, bottom). From left to right: original ABM simulations, surrogate, diffusion-only ablation, GNN-only ablation. The dashed vertical line indicates the end of the training phase for the surrogate and ablation models.
  • Figure 4: Errors obtained by the proposed approach (Surrogate) and by the naive baselines (Diffusion-only and GNN-only ablation models) in four different tasks. In the first column, error is measured as the EMD between the true and predicted distribution of individual (micro-level) behavior, i.e. predicting the next state of each agent from the previous one. In the second column, error is measured as the difference (sMAPE) in system-level quantities, i.e. comparing the true values of the number of agents with a given state with the one predicted by our model when trained on a fraction of the initial time steps (as in Figure \ref{['fig:spaghetti_Z1_surrogate']}). In the first row, we test three configurations of the Schelling model; in the second row, we compare four configurations of the Predator-Prey model.
  • Figure 5: Training time, micro and macro metrics with respect to the number of ramifications provided during training for 10 experiments on one dataset of Predator-Prey with parameter set $\Psi_1$. Points indicate the mean value and error bars standard deviation over the 10 experiments.
  • ...and 15 more figures