Table of Contents
Fetching ...

Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting

Christopher Cheong, Gary Davis, Seongjin Choi

TL;DR

Weaver introduces a scalable spatiotemporal forecasting framework for traffic networks by decomposing full spatiotemporal attention via Kronecker product approximations, enabling efficient P2-KMV message passing on a Kronecker- TEN representation. It couples local signed spatial/temporal attention with a Traffic Phase Dictionary for self-conditioning and uses a Continuous Tanimoto Coefficient to model negative traffic interactions stably. The approach achieves competitive accuracy on PEMS-BAY and METR-LA, while delivering strong training efficiency and robustness under missing data. Ablations show the Kronecker attention, valence attention, and phase dictionary each contribute to stability and performance, particularly at longer horizons. The work provides a principled, physics-inspired, graph-based perspective with potential extensions to transferability, geometry-aware retrieval, and physics-informed modeling.

Abstract

Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dynamics and social behavioral patterns. The importance of transportation networks and ITS for modern mobility and commerce necessitates forecasting models that are not only accurate but also interpretable, efficient, and robust under structural or temporal perturbations. Recent approaches, particularly Transformer-based architectures, have improved predictive performance but often at the cost of high computational overhead and diminished architectural interpretability. In this work, we introduce Weaver, a novel attention-based model that applies Kronecker product approximations (KPA) to decompose the PN X PN spatiotemporal attention of O(P^2N^2) complexity into local P X P temporal and N X N spatial attention maps. This Kronecker attention map enables our Parallel-Kronecker Matrix-Vector product (P2-KMV) for efficient spatiotemporal message passing with O(P^2N + N^2P) complexity. To capture real-world traffic dynamics, we address the importance of negative edges in modeling traffic behavior by introducing Valence Attention using the continuous Tanimoto coefficient (CTC), which provides properties conducive to precise latent graph generation and training stability. To fully utilize the model's learning capacity, we introduce the Traffic Phase Dictionary for self-conditioning. Evaluations on PEMS-BAY and METR-LA show that Weaver achieves competitive performance across model categories while training more efficiently.

Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting

TL;DR

Weaver introduces a scalable spatiotemporal forecasting framework for traffic networks by decomposing full spatiotemporal attention via Kronecker product approximations, enabling efficient P2-KMV message passing on a Kronecker- TEN representation. It couples local signed spatial/temporal attention with a Traffic Phase Dictionary for self-conditioning and uses a Continuous Tanimoto Coefficient to model negative traffic interactions stably. The approach achieves competitive accuracy on PEMS-BAY and METR-LA, while delivering strong training efficiency and robustness under missing data. Ablations show the Kronecker attention, valence attention, and phase dictionary each contribute to stability and performance, particularly at longer horizons. The work provides a principled, physics-inspired, graph-based perspective with potential extensions to transferability, geometry-aware retrieval, and physics-informed modeling.

Abstract

Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dynamics and social behavioral patterns. The importance of transportation networks and ITS for modern mobility and commerce necessitates forecasting models that are not only accurate but also interpretable, efficient, and robust under structural or temporal perturbations. Recent approaches, particularly Transformer-based architectures, have improved predictive performance but often at the cost of high computational overhead and diminished architectural interpretability. In this work, we introduce Weaver, a novel attention-based model that applies Kronecker product approximations (KPA) to decompose the PN X PN spatiotemporal attention of O(P^2N^2) complexity into local P X P temporal and N X N spatial attention maps. This Kronecker attention map enables our Parallel-Kronecker Matrix-Vector product (P2-KMV) for efficient spatiotemporal message passing with O(P^2N + N^2P) complexity. To capture real-world traffic dynamics, we address the importance of negative edges in modeling traffic behavior by introducing Valence Attention using the continuous Tanimoto coefficient (CTC), which provides properties conducive to precise latent graph generation and training stability. To fully utilize the model's learning capacity, we introduce the Traffic Phase Dictionary for self-conditioning. Evaluations on PEMS-BAY and METR-LA show that Weaver achieves competitive performance across model categories while training more efficiently.

Paper Structure

This paper contains 114 sections, 95 equations, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Illustration of Weaver performing spatiotemporal forecast as a bulk prediction problem in a quasi-static Time-expanded Network (TEN) setting. Weaver analyzes traffic patterns within the input system-state $\bm{\mathcal{X}}\xspace \in \mathbb{R}^{P \times N \times C}$ defined over the history block $(t-P+1, t-P+2, \ldots, t-1, t)$ using its Kronecker Attention module supported by the Traffic Phase Dictionary and State Transition module (see Figure \ref{['fig:weaver-superstructure']}). These components learn and apply the traffic network's bulk dynamics on the $PN \times PN$ spatiotemporal attention map to predict the block-level system trajectory, resulting in forecasted system-state $\widehat{\bm{\mathcal{Y}}\xspace}$ over the forecast block $(t+1, t+2, \ldots, t+Q-1, t+Q)$.
  • Figure 2: Flowchart of concept development in Weaver. For STMP, the P$^2$-KMV is discussed in Section \ref{['section:problem1-kten']} and P$^\Delta$-KMV in Appendix \ref{['appx:generalized-pkmv-main']}; Signed networks, valence attention and Tanimoto coefficient in Section \ref{['section:problem2-signed-graphs']}; for model self-conditioning, the sparse manifold transform and traffic phase dictionary in Section \ref{['section:problem3-self-conditioning']}; for KPS, see W-iKPS in Section \ref{['section:kronecker-attention-module']} and Appendix \ref{['appx:weighted-KPS']}.
  • Figure 3: Common spatiotemporal graph representations and our Kronecker Time Expanded Network (Kronecker-TEN, $\mathcal{K}^{}_{10}$). (A) Graph snapshots: Each snapshot represents the network at a single time point with edges connecting nodes only within that time slice. (B) Time Expanded Networks (TEN): Nodes are identified by both location and time, with edges representing displacement across space and time. (C) Kronecker-TEN: Applying KPA to the STAM: $\Theta_{\mathcal{ST}} = \Theta_{\mathcal{T}} \otimes \Theta_{\mathcal{S}}$ induces a valid spatiotemporal graph via the Kronecker graph product: $\mathcal{G}_{\mathcal{ST}} \approx \mathcal{K}^{}_{10} = \mathcal{G}_{\mathcal{T}} \otimes \mathcal{G}_{\mathcal{S}}$. First-order linkage (MP1) uses pairwise dot-product attention, while second-order linkage (MP2) applies the Kronecker attention kernel.
  • Figure 4: Conventional heuristic spatiotemporal processing in classical and contemporary models. (A) Axis alternation swaps processing between temporal and spatial axes, requiring multiple iterations to emulate spatiotemporal graph diffusion. (B) Split-and-weld performs separate spatial and temporal analysis, then welds them via concatenation or cross-attention prior to parametric mixing, similarly requiring multiple iterations to emulate spatiotemporal graph diffusion.
  • Figure 5: Latent graph representations induced by dot-product kernels $\varphi$ (e.g., dot-product attention, cosine similarity) between receiver (r, query) and sender (s, key) nodes. In typical attention, node routing vectors $\mathbf{u}_r$ and $\mathbf{u}_s$ (typically node features) are projected by routing weights $\mathbf{W}_{q}$ and $\mathbf{W}_{k}$, parameterizing a family of possible node-to-node relations. For each node-pair, the kernel $\varphi(\mathbf{q},\mathbf{k})$ instantiates a latent edge by producing an edge affinity (strength and polarity); stacking all affinities over node pairs, i.e., $\varphi(\mathbf{Q}, \mathbf{K})$ yields attention maps $\Theta$. The latent graph can be further shaped by augmenting the routing vectors, for example by appending structural encodings $\mathring{\mathbf{b} \in \mathbb{R}^{N \times M_B}}$.
  • ...and 13 more figures