Machine Unlearning of Traffic State Estimation and Prediction
Xin Wang, R. Tyrrell Rockafellar, Xuegang, Ban
TL;DR
The paper tackles privacy and data-fidelity concerns in data-driven TSEP by introducing a constrained machine unlearning framework that removes the influence of forgotten data without full retraining. It formulates unlearning as a sensitivity-analysis problem on data weights within constrained optimization, and derives a tractable auxiliary quadratic program to compute parameter updates. The method is demonstrated on SVM and PINN-based traffic state estimation, showing that unlearned models closely match retrained gold standards while delivering substantial computational savings. This approach enhances privacy, robustness, and efficiency in TSEP pipelines, with potential extensions to streaming data and adversarial-attack defense.
Abstract
Data-driven traffic state estimation and prediction (TSEP) relies heavily on data sources that contain sensitive information. While the abundance of data has fueled significant breakthroughs, particularly in machine learning-based methods, it also raises concerns regarding privacy, cybersecurity, and data freshness. These issues can erode public trust in intelligent transportation systems. Recently, regulations have introduced the "right to be forgotten", allowing users to request the removal of their private data from models. As machine learning models can remember old data, simply removing it from back-end databases is insufficient in such systems. To address these challenges, this study introduces a novel learning paradigm for TSEP-Machine Unlearning TSEP-which enables a trained TSEP model to selectively forget privacy-sensitive, poisoned, or outdated data. By empowering models to "unlearn," we aim to enhance the trustworthiness and reliability of data-driven traffic TSEP.
