Federated Learning for Traffic Flow Prediction with Synthetic Data Augmentation
Fermin Orozco, Pedro Porto Buarque de Gusmão, Hongkai Wen, Johan Wahlström, Man Luo
TL;DR
This paper tackles the privacy-preserving challenge of learning accurate traffic flow predictors across multiple organizations by proposing FedTPS, a cross-silo FL framework that first trains a federated diffusion-based trajectory generator to produce synthetic data reflecting the global distribution. The synthetic data is then used to augment each client’s local dataset during federated traffic flow prediction, mitigating heterogeneity and expanding training data. A novel Graph Attention with Temporal Attention Unit (GATAU) is introduced for spatio-temporal modelling, and diffusion-based data generation is implemented via a FedAvg-based training framework (TFDiff). Experimental results on real-world ride-sharing data from Chengdu and Xi’an show that FedTPS consistently outperforms other FL baselines in global model performance, with faster convergence when pre-training is used and robust improvements across varying client counts. The approach reduces data-silo heterogeneity and offers a practically impactful route to privacy-preserving, scalable traffic prediction in intelligent transportation systems.
Abstract
Deep-learning based traffic prediction models require vast amounts of data to learn embedded spatial and temporal dependencies. The inherent privacy and commercial sensitivity of such data has encouraged a shift towards decentralised data-driven methods, such as Federated Learning (FL). Under a traditional Machine Learning paradigm, traffic flow prediction models can capture spatial and temporal relationships within centralised data. In reality, traffic data is likely distributed across separate data silos owned by multiple stakeholders. In this work, a cross-silo FL setting is motivated to facilitate stakeholder collaboration for optimal traffic flow prediction applications. This work introduces an FL framework, referred to as FedTPS, to generate synthetic data to augment each client's local dataset by training a diffusion-based trajectory generation model through FL. The proposed framework is evaluated on a large-scale real world ride-sharing dataset using various FL methods and Traffic Flow Prediction models, including a novel prediction model we introduce, which leverages Temporal and Graph Attention mechanisms to learn the Spatio-Temporal dependencies embedded within regional traffic flow data. Experimental results show that FedTPS outperforms multiple other FL baselines with respect to global model performance.
