Table of Contents
Fetching ...

Federated Learning for Traffic Flow Prediction with Synthetic Data Augmentation

Fermin Orozco, Pedro Porto Buarque de Gusmão, Hongkai Wen, Johan Wahlström, Man Luo

TL;DR

This paper tackles the privacy-preserving challenge of learning accurate traffic flow predictors across multiple organizations by proposing FedTPS, a cross-silo FL framework that first trains a federated diffusion-based trajectory generator to produce synthetic data reflecting the global distribution. The synthetic data is then used to augment each client’s local dataset during federated traffic flow prediction, mitigating heterogeneity and expanding training data. A novel Graph Attention with Temporal Attention Unit (GATAU) is introduced for spatio-temporal modelling, and diffusion-based data generation is implemented via a FedAvg-based training framework (TFDiff). Experimental results on real-world ride-sharing data from Chengdu and Xi’an show that FedTPS consistently outperforms other FL baselines in global model performance, with faster convergence when pre-training is used and robust improvements across varying client counts. The approach reduces data-silo heterogeneity and offers a practically impactful route to privacy-preserving, scalable traffic prediction in intelligent transportation systems.

Abstract

Deep-learning based traffic prediction models require vast amounts of data to learn embedded spatial and temporal dependencies. The inherent privacy and commercial sensitivity of such data has encouraged a shift towards decentralised data-driven methods, such as Federated Learning (FL). Under a traditional Machine Learning paradigm, traffic flow prediction models can capture spatial and temporal relationships within centralised data. In reality, traffic data is likely distributed across separate data silos owned by multiple stakeholders. In this work, a cross-silo FL setting is motivated to facilitate stakeholder collaboration for optimal traffic flow prediction applications. This work introduces an FL framework, referred to as FedTPS, to generate synthetic data to augment each client's local dataset by training a diffusion-based trajectory generation model through FL. The proposed framework is evaluated on a large-scale real world ride-sharing dataset using various FL methods and Traffic Flow Prediction models, including a novel prediction model we introduce, which leverages Temporal and Graph Attention mechanisms to learn the Spatio-Temporal dependencies embedded within regional traffic flow data. Experimental results show that FedTPS outperforms multiple other FL baselines with respect to global model performance.

Federated Learning for Traffic Flow Prediction with Synthetic Data Augmentation

TL;DR

This paper tackles the privacy-preserving challenge of learning accurate traffic flow predictors across multiple organizations by proposing FedTPS, a cross-silo FL framework that first trains a federated diffusion-based trajectory generator to produce synthetic data reflecting the global distribution. The synthetic data is then used to augment each client’s local dataset during federated traffic flow prediction, mitigating heterogeneity and expanding training data. A novel Graph Attention with Temporal Attention Unit (GATAU) is introduced for spatio-temporal modelling, and diffusion-based data generation is implemented via a FedAvg-based training framework (TFDiff). Experimental results on real-world ride-sharing data from Chengdu and Xi’an show that FedTPS consistently outperforms other FL baselines in global model performance, with faster convergence when pre-training is used and robust improvements across varying client counts. The approach reduces data-silo heterogeneity and offers a practically impactful route to privacy-preserving, scalable traffic prediction in intelligent transportation systems.

Abstract

Deep-learning based traffic prediction models require vast amounts of data to learn embedded spatial and temporal dependencies. The inherent privacy and commercial sensitivity of such data has encouraged a shift towards decentralised data-driven methods, such as Federated Learning (FL). Under a traditional Machine Learning paradigm, traffic flow prediction models can capture spatial and temporal relationships within centralised data. In reality, traffic data is likely distributed across separate data silos owned by multiple stakeholders. In this work, a cross-silo FL setting is motivated to facilitate stakeholder collaboration for optimal traffic flow prediction applications. This work introduces an FL framework, referred to as FedTPS, to generate synthetic data to augment each client's local dataset by training a diffusion-based trajectory generation model through FL. The proposed framework is evaluated on a large-scale real world ride-sharing dataset using various FL methods and Traffic Flow Prediction models, including a novel prediction model we introduce, which leverages Temporal and Graph Attention mechanisms to learn the Spatio-Temporal dependencies embedded within regional traffic flow data. Experimental results show that FedTPS outperforms multiple other FL baselines with respect to global model performance.

Paper Structure

This paper contains 22 sections, 11 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: Federated framework for the traffic flow prediction task within an ITS. Each organisation collects data from their respective vehicle fleet, and through FL, collaboratively train a model to estimate traffic flow between regions (shown as orange grid cells).
  • Figure 2: FedTPS Framework for training a federated generative model and subsequent traffic flow prediction model. The first stage, shown as the purple dashed line, trains the federated diffusion model to develop the FedTPS synthetic dataset. The second stage, shown as the orange dashed line, trains the federated traffic flow prediction model with the client's local dataset, as well as the synthetic data. $X^{\textnormal{CO}}_c$ represents client $c$'s conditional observations required for the TFDiff model.
  • Figure 3: General model architecture comparison of TAU and GATAU.
  • Figure 4: Architecture of the proposed GATAU model.
  • Figure 5: Satellite images of Chengdu (left) and Xi'an (right), with real sample trajectories shown in cyan representing $X_c$, and where traffic flow into the red regions represents $Q_c$.
  • ...and 2 more figures