Continual Traffic Forecasting via Mixture of Experts

Sanghyun Lee; Chanyoung Park

Continual Traffic Forecasting via Mixture of Experts

Sanghyun Lee, Chanyoung Park

TL;DR

This work tackles continual learning for traffic forecasting in evolving networks where new sensors are added over time. It introduces Traffic Forecasting Mixture of Experts (TFMoE), which forms $K$ homogeneous sensor groups and assigns an expert (a reconstructor via a VAE and a predictor) to each group, enabling specialized learning and reduced forgetting. The method integrates a reconstruction-based gating mechanism, a graph-structure learner for predictors, and three forgetting-mitigation strategies: reconstruction-based knowledge consolidation, forgetting-resilient sampling, and reconstruction-based replay. Evaluations on the PEMSD3-Stream dataset show TFMoE achieves superior accuracy and robustness to forgetting with minimal access to past data, indicating strong practical potential for continual traffic forecasting in expanding networks.

Abstract

The real-world traffic networks undergo expansion through the installation of new sensors, implying that the traffic patterns continually evolve over time. Incrementally training a model on the newly added sensors would make the model forget the past knowledge, i.e., catastrophic forgetting, while retraining the model on the entire network to capture these changes is highly inefficient. To address these challenges, we propose a novel Traffic Forecasting Mixture of Experts (TFMoE) for traffic forecasting under evolving networks. The main idea is to segment the traffic flow into multiple homogeneous groups, and assign an expert model responsible for a specific group. This allows each expert model to concentrate on learning and adapting to a specific set of patterns, while minimizing interference between the experts during training, thereby preventing the dilution or replacement of prior knowledge, which is a major cause of catastrophic forgetting. Through extensive experiments on a real-world long-term streaming network dataset, PEMSD3-Stream, we demonstrate the effectiveness and efficiency of TFMoE. Our results showcase superior performance and resilience in the face of catastrophic forgetting, underscoring the effectiveness of our approach in dealing with continual learning for traffic flow forecasting in long-term streaming networks.

Continual Traffic Forecasting via Mixture of Experts

TL;DR

This work tackles continual learning for traffic forecasting in evolving networks where new sensors are added over time. It introduces Traffic Forecasting Mixture of Experts (TFMoE), which forms

homogeneous sensor groups and assigns an expert (a reconstructor via a VAE and a predictor) to each group, enabling specialized learning and reduced forgetting. The method integrates a reconstruction-based gating mechanism, a graph-structure learner for predictors, and three forgetting-mitigation strategies: reconstruction-based knowledge consolidation, forgetting-resilient sampling, and reconstruction-based replay. Evaluations on the PEMSD3-Stream dataset show TFMoE achieves superior accuracy and robustness to forgetting with minimal access to past data, indicating strong practical potential for continual traffic forecasting in expanding networks.

Abstract

Paper Structure (33 sections, 3 equations, 9 figures, 16 tables)

This paper contains 33 sections, 3 equations, 9 figures, 16 tables.

Introduction
RELATED WORK
Problem Definition
Proposed Method: TFMoE
Pre-training Stage
Reconstruction-Based Clustering
Constructing Experts
Training Reconstructor of Expert
Localized Adaptation Stage
Training Predictor of Expert
Reconstruction-Based Knowledge Consolidation
Forgetting-Resilient Sampling
Reconstruction-Based Replay
Experiments
Experimental Results
...and 18 more sections

Figures (9)

Figure 1: The top-left plot depicts the t-SNE visualization of one week's traffic patterns gathered from each sensor in the traffic network, with newly added sensors of the next year marked by stars. The top-right image shows the geographical location of a new sensor and its closest counterpart in the latent space. The bottom plot indicates the notable similarity between the traffic patterns obtained from these two sensors.
Figure 2: Architecture of Pre-training Stage.
Figure 3: Architecture of Localized Adaptation Stage.
Figure 4: MAE of traffic flow forecasting, averaged over time horizons from 2011 to 2017, with 1-sigma error bars.
Figure 5: Comparison of sensor allocation to experts over the years (2011-2017) (a) With consolidation loss and (b) Without consolidation loss.
...and 4 more figures

Continual Traffic Forecasting via Mixture of Experts

TL;DR

Abstract

Continual Traffic Forecasting via Mixture of Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (9)