Table of Contents
Fetching ...

City-Conditioned Memory for Multi-City Traffic and Mobility Forecasting

Wenzhang Du

TL;DR

CityCond introduces a lightweight, backbone-agnostic memory layer for multi-city traffic and mobility forecasting, combining a CityID embedding with an optional CityMem to produce city-conditioned features that fuse into existing spatio-temporal backbones. The approach is evaluated across five backbones and three data regimes on METR-LA, PEMS-BAY, and SIND, showing consistent improvements, with the largest gains for high-capacity models like Transformers and STGCN, particularly in low-data and cross-city transfer scenarios. CityID provides a strong baseline, while CityMem adds a shared pool of reusable spatio-temporal motifs that enhances cross-city generalization and data efficiency. Overall, CityCond serves as a practical, reusable design pattern for scalable, multi-city forecasting under realistic data constraints, with potential extensions to richer city descriptors, region-level memories, and online learning.

Abstract

Deploying spatio-temporal forecasting models across many cities is difficult: traffic networks differ in size and topology, data availability can vary by orders of magnitude, and new cities may provide only a short history of logs. Existing deep traffic models are typically trained per city and backbone, creating high maintenance cost and poor transfer to data-scarce cities. We ask whether a single, backbone-agnostic layer can condition on "which city this sequence comes from", improve accuracy in full- and low-data regimes, and support better cross-city adaptation with minimal code changes. We propose CityCond, a light-weight city-conditioned memory layer that augments existing spatio-temporal backbones. CityCond combines a city-ID encoder with an optional shared memory bank (CityMem). Given a city index and backbone hidden states, it produces city-conditioned features fused through gated residual connections. We attach CityCond to five representative backbones (GRU, TCN, Transformer, GNN, STGCN) and evaluate three regimes: full-data, low-data, and cross-city few-shot transfer on METR-LA and PEMS-BAY. We also run auxiliary experiments on SIND, a drone-based multi-agent trajectory dataset from a signalized intersection in Tianjin (we focus on pedestrian tracks). Across more than fourteen model variants and three random seeds, CityCond yields consistent improvements, with the largest gains for high-capacity backbones such as Transformers and STGCNs. CityMem reduces Transformer error by roughly one third in full-data settings and brings substantial gains in low-data and cross-city transfer. On SIND, simple city-ID conditioning modestly improves low-data LSTM performance. CityCond can therefore serve as a reusable design pattern for scalable, multi-city forecasting under realistic data constraints.

City-Conditioned Memory for Multi-City Traffic and Mobility Forecasting

TL;DR

CityCond introduces a lightweight, backbone-agnostic memory layer for multi-city traffic and mobility forecasting, combining a CityID embedding with an optional CityMem to produce city-conditioned features that fuse into existing spatio-temporal backbones. The approach is evaluated across five backbones and three data regimes on METR-LA, PEMS-BAY, and SIND, showing consistent improvements, with the largest gains for high-capacity models like Transformers and STGCN, particularly in low-data and cross-city transfer scenarios. CityID provides a strong baseline, while CityMem adds a shared pool of reusable spatio-temporal motifs that enhances cross-city generalization and data efficiency. Overall, CityCond serves as a practical, reusable design pattern for scalable, multi-city forecasting under realistic data constraints, with potential extensions to richer city descriptors, region-level memories, and online learning.

Abstract

Deploying spatio-temporal forecasting models across many cities is difficult: traffic networks differ in size and topology, data availability can vary by orders of magnitude, and new cities may provide only a short history of logs. Existing deep traffic models are typically trained per city and backbone, creating high maintenance cost and poor transfer to data-scarce cities. We ask whether a single, backbone-agnostic layer can condition on "which city this sequence comes from", improve accuracy in full- and low-data regimes, and support better cross-city adaptation with minimal code changes. We propose CityCond, a light-weight city-conditioned memory layer that augments existing spatio-temporal backbones. CityCond combines a city-ID encoder with an optional shared memory bank (CityMem). Given a city index and backbone hidden states, it produces city-conditioned features fused through gated residual connections. We attach CityCond to five representative backbones (GRU, TCN, Transformer, GNN, STGCN) and evaluate three regimes: full-data, low-data, and cross-city few-shot transfer on METR-LA and PEMS-BAY. We also run auxiliary experiments on SIND, a drone-based multi-agent trajectory dataset from a signalized intersection in Tianjin (we focus on pedestrian tracks). Across more than fourteen model variants and three random seeds, CityCond yields consistent improvements, with the largest gains for high-capacity backbones such as Transformers and STGCNs. CityMem reduces Transformer error by roughly one third in full-data settings and brings substantial gains in low-data and cross-city transfer. On SIND, simple city-ID conditioning modestly improves low-data LSTM performance. CityCond can therefore serve as a reusable design pattern for scalable, multi-city forecasting under realistic data constraints.

Paper Structure

This paper contains 35 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of CityCond. A city index is embedded and optionally used to query a shared memory bank (CityMem). The resulting city-conditioned context is fused back into backbone hidden states through gated residual connections, enabling backbone-agnostic integration.
  • Figure 2: Low-data traffic forecasting performance. (a) Transformer variants as the available training fraction decreases. (b) STGCN variants under the same low-data settings.
  • Figure 3: Cross-city few-shot transfer for Transformer backbones. Adaptation curves illustrate that CityMem improves both convergence speed and final error in both transfer directions.