Privacy-Preserving Data Fusion for Traffic State Estimation: A Vertical Federated Learning Approach
Qiqing Wang, Kaidi Yang
TL;DR
The paper tackles privacy concerns in traffic state estimation by proposing FedTSE, a vertical federated learning framework that fuses MA loop-detector data with MP trajectory features without sharing raw data. It introduces FedTSE-PI, a physics-informed variant that leverages traffic flow models to improve data efficiency when ground-truth labels are scarce, using secure inner-product encryption to protect MP data during gradient computations. Case studies on real-world and simulated urban networks show FedTSE nearly matches oracle performance and FedTSE-PI approaches the accuracy of privacy-unrestricted benchmarks while preserving privacy, with notable gains when MPs share higher-resolution data. The work demonstrates that privacy-preserving data fusion can meaningfully enhance TSE, and it highlights future directions in differential privacy, zero-knowledge verifications, uncertainty quantification, and reinforcement learning for traffic control. The practical impact lies in enabling cross-ownership collaboration for ITS applications without compromising sensitive operational data or customer information.
Abstract
This paper proposes a privacy-preserving data fusion method for traffic state estimation (TSE). Unlike existing works that assume all data sources to be accessible by a single trusted party, we explicitly address data privacy concerns that arise in the collaboration and data sharing between multiple data owners, such as municipal authorities (MAs) and mobility providers (MPs). To this end, we propose a novel vertical federated learning (FL) approach, FedTSE, that enables multiple data owners to collaboratively train and apply a TSE model without having to exchange their private data. To enhance the applicability of the proposed FedTSE in common TSE scenarios with limited availability of ground-truth data, we further propose a privacy-preserving physics-informed FL approach, i.e., FedTSE-PI, that integrates traffic models into FL. Real-world data validation shows that the proposed methods can protect privacy while yielding similar accuracy to the oracle method without privacy considerations.
