Table of Contents
Fetching ...

Machine Unlearning of Traffic State Estimation and Prediction

Xin Wang, R. Tyrrell Rockafellar, Xuegang, Ban

TL;DR

The paper tackles privacy and data-fidelity concerns in data-driven TSEP by introducing a constrained machine unlearning framework that removes the influence of forgotten data without full retraining. It formulates unlearning as a sensitivity-analysis problem on data weights within constrained optimization, and derives a tractable auxiliary quadratic program to compute parameter updates. The method is demonstrated on SVM and PINN-based traffic state estimation, showing that unlearned models closely match retrained gold standards while delivering substantial computational savings. This approach enhances privacy, robustness, and efficiency in TSEP pipelines, with potential extensions to streaming data and adversarial-attack defense.

Abstract

Data-driven traffic state estimation and prediction (TSEP) relies heavily on data sources that contain sensitive information. While the abundance of data has fueled significant breakthroughs, particularly in machine learning-based methods, it also raises concerns regarding privacy, cybersecurity, and data freshness. These issues can erode public trust in intelligent transportation systems. Recently, regulations have introduced the "right to be forgotten", allowing users to request the removal of their private data from models. As machine learning models can remember old data, simply removing it from back-end databases is insufficient in such systems. To address these challenges, this study introduces a novel learning paradigm for TSEP-Machine Unlearning TSEP-which enables a trained TSEP model to selectively forget privacy-sensitive, poisoned, or outdated data. By empowering models to "unlearn," we aim to enhance the trustworthiness and reliability of data-driven traffic TSEP.

Machine Unlearning of Traffic State Estimation and Prediction

TL;DR

The paper tackles privacy and data-fidelity concerns in data-driven TSEP by introducing a constrained machine unlearning framework that removes the influence of forgotten data without full retraining. It formulates unlearning as a sensitivity-analysis problem on data weights within constrained optimization, and derives a tractable auxiliary quadratic program to compute parameter updates. The method is demonstrated on SVM and PINN-based traffic state estimation, showing that unlearned models closely match retrained gold standards while delivering substantial computational savings. This approach enhances privacy, robustness, and efficiency in TSEP pipelines, with potential extensions to streaming data and adversarial-attack defense.

Abstract

Data-driven traffic state estimation and prediction (TSEP) relies heavily on data sources that contain sensitive information. While the abundance of data has fueled significant breakthroughs, particularly in machine learning-based methods, it also raises concerns regarding privacy, cybersecurity, and data freshness. These issues can erode public trust in intelligent transportation systems. Recently, regulations have introduced the "right to be forgotten", allowing users to request the removal of their private data from models. As machine learning models can remember old data, simply removing it from back-end databases is insufficient in such systems. To address these challenges, this study introduces a novel learning paradigm for TSEP-Machine Unlearning TSEP-which enables a trained TSEP model to selectively forget privacy-sensitive, poisoned, or outdated data. By empowering models to "unlearn," we aim to enhance the trustworthiness and reliability of data-driven traffic TSEP.

Paper Structure

This paper contains 26 sections, 62 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of $\bar{\theta}_{-D^r}$ estimation without retraining
  • Figure 2: Comparison of Decision Boundaries: Original SVM, Unlearned Model, and Gold Standard Model
  • Figure 3: Visualization of the vehicle velocity field on I-80 from NGSIM
  • Figure 4: Illustration of trajectory removal and its impact on observed velocity. (a) All trajectories. (b) Subset of removed trajectories. (c) Change in average velocity in affected spatiotemporal bins due to trajectory removal.
  • Figure 5: Predicted velocity field differences after model unlearning and retraining (12660 trajectories removed).
  • ...and 1 more figures