Table of Contents
Fetching ...

Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting

Walid Guettala, Yufan Zhao, László Gulyás

TL;DR

This work tackles traffic forecasting by integrating explicit road-network topology into a mixture-of-experts framework. It introduces TESTAM+ with a SpatioSemantic Expert and a memory-based routing mechanism, enabling dynamic, topology-aware spatial modeling while maintaining parallel, non-autoregressive forecasting. Empirically, TESTAM+ achieves state-of-the-art MAE on METR-LA ($2.99$ vs $3.38$ for MegaCRN) and PEMS-BAY ($1.63$ MAE for Ad/SS), with substantial latency reductions (up to $53.1\%$–$61.7\%$) compared to full ensembles, demonstrating that carefully designed, fewer experts can outperform larger ensembles. The findings advocate for domain-aware expert design and efficient routing in MoE architectures to enable real-time deployment in complex urban networks.

Abstract

Traffic forecasting is fundamental to intelligent transportation systems, enabling congestion mitigation and emission reduction in increasingly complex urban environments. While recent graph neural network approaches have advanced spatial temporal modeling, existing mixture of experts frameworks like Time Enhanced Spatio Temporal Attention Model (TESTAM) lack explicit incorporation of physical road network topology, limiting their spatial capabilities. We present TESTAM+, an enhanced spatio temporal forecasting framework that introduces a novel SpatioSemantic Expert integrating physical road topology with data driven feature similarity through hybrid graph construction. TESTAM+ achieves significant improvements over TESTAM: 1.3% MAE reduction on METR LA (3.10 vs. 3.14) and 4.1% improvement on PEMS BAY (1.65 vs. 1.72). Through comprehensive ablation studies, we discover that strategic expert selection fundamentally outperforms naive ensemble aggregation. Individual experts demonstrate remarkable effectiveness: the Adaptive Expert achieves 1.63 MAE on PEMS BAY, outperforming the original three expert TESTAM (1.72 MAE), while the SpatioSemantic Expert matches this performance with identical 1.63 MAE. The optimal Identity + Adaptive configuration achieves an 11.5% MAE reduction compared to state of the art MegaCRN on METR LA (2.99 vs. 3.38), while reducing inference latency by 53.1% compared to the full four expert TESTAM+. Our findings reveal that fewer, strategically designed experts outperform complex multi expert ensembles, establishing new state of the art performance with superior computational efficiency for real time deployment.

Less is More: Strategic Expert Selection Outperforms Ensemble Complexity in Traffic Forecasting

TL;DR

This work tackles traffic forecasting by integrating explicit road-network topology into a mixture-of-experts framework. It introduces TESTAM+ with a SpatioSemantic Expert and a memory-based routing mechanism, enabling dynamic, topology-aware spatial modeling while maintaining parallel, non-autoregressive forecasting. Empirically, TESTAM+ achieves state-of-the-art MAE on METR-LA ( vs for MegaCRN) and PEMS-BAY ( MAE for Ad/SS), with substantial latency reductions (up to ) compared to full ensembles, demonstrating that carefully designed, fewer experts can outperform larger ensembles. The findings advocate for domain-aware expert design and efficient routing in MoE architectures to enable real-time deployment in complex urban networks.

Abstract

Traffic forecasting is fundamental to intelligent transportation systems, enabling congestion mitigation and emission reduction in increasingly complex urban environments. While recent graph neural network approaches have advanced spatial temporal modeling, existing mixture of experts frameworks like Time Enhanced Spatio Temporal Attention Model (TESTAM) lack explicit incorporation of physical road network topology, limiting their spatial capabilities. We present TESTAM+, an enhanced spatio temporal forecasting framework that introduces a novel SpatioSemantic Expert integrating physical road topology with data driven feature similarity through hybrid graph construction. TESTAM+ achieves significant improvements over TESTAM: 1.3% MAE reduction on METR LA (3.10 vs. 3.14) and 4.1% improvement on PEMS BAY (1.65 vs. 1.72). Through comprehensive ablation studies, we discover that strategic expert selection fundamentally outperforms naive ensemble aggregation. Individual experts demonstrate remarkable effectiveness: the Adaptive Expert achieves 1.63 MAE on PEMS BAY, outperforming the original three expert TESTAM (1.72 MAE), while the SpatioSemantic Expert matches this performance with identical 1.63 MAE. The optimal Identity + Adaptive configuration achieves an 11.5% MAE reduction compared to state of the art MegaCRN on METR LA (2.99 vs. 3.38), while reducing inference latency by 53.1% compared to the full four expert TESTAM+. Our findings reveal that fewer, strategically designed experts outperform complex multi expert ensembles, establishing new state of the art performance with superior computational efficiency for real time deployment.

Paper Structure

This paper contains 26 sections, 10 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the TESTAM+ pipeline. Input features are first augmented with TIM and then fed in parallel to four experts: Identity (pure temporal modeling, no spatial structure), Adaptive (learned graph), Attention (attention-based graph), and SpatioSemantic (hybrid static and learnable spatial priors). Each expert constructs its own spatial graph and extracts spatio-temporal features. A memory-query similarity gating module then computes dot-product scores between a pooled input summary and trainable expert keys to perform top-1 routing, and the selected expert’s output is fused to produce the traffic forecast. The red dashed box marks the SpatioSemantic Expert added in TESTAM+, while without it, the figure represents the original TESTAM.
  • Figure 2: The four expert spatial model block in TESTAM+. Black lines denote spatial connectivity and red lines indicate information flow along those connections. The Identity Expert focuses solely on temporal dependencies without spatial edges. The Adaptive Expert learns a static graph to capture recurring spatial relations. The Attention Expert dynamically infers spatial connectivity via attention to model non-recurring interactions. The SpatioSemantic Expert integrates physical road topology and data-driven similarity to construct a hybrid spatial graph.