Table of Contents
Fetching ...

Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases

Ziyao Cui, Minxing Zhang, Jian Pei

TL;DR

The work addresses privacy risks in sequential data publishing, showing that even when each release satisfies a privacy bound $\lambda$, temporal correlations can reveal private information. It introduces a Bi-Directional Hidden Markov Model enhanced with reinforcement learning (EPRL) to exploit sequential dependencies and infer private trajectories from published regions, using a T2P mapping and IoU-based rewards. Experimental results on Geolife, Porto Taxi, and SynMob datasets demonstrate substantial improvements over independent-release baselines, highlighting a fundamental risk and the need for temporally aware privacy frameworks. The findings motivate defense strategies such as time-aware differential privacy, trajectory-to-publication mapping randomization, and dynamic privacy budgets to mitigate cross-release inferences across domains beyond mobility data.

Abstract

Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as healthcare, finance, and mobility. While prior research has focused on protecting privacy in a single data release, many real-world systems operate under sequential or continuous data publishing, where the same or related data are released over time. Such sequential disclosures introduce new vulnerabilities, as temporal correlations across releases may enable adversaries to infer sensitive information that remains hidden in any individual release. In this paper, we investigate whether an attacker can compromise privacy in sequential data releases by exploiting dependencies between consecutive publications, even when each individual release satisfies standard privacy guarantees. To this end, we propose a novel attack model that captures these sequential dependencies by integrating a Hidden Markov Model with a reinforcement learning-based bi-directional inference mechanism. This enables the attacker to leverage both earlier and later observations in the sequence to infer private information. We instantiate our framework in the context of trajectory data, demonstrating how an adversary can recover sensitive locations from sequential mobility datasets. Extensive experiments on Geolife, Porto Taxi, and SynMob datasets show that our model consistently outperforms baseline approaches that treat each release independently. The results reveal a fundamental privacy risk inherent to sequential data publishing, where individually protected releases can collectively leak sensitive information when analyzed temporally. These findings underscore the need for new privacy-preserving frameworks that explicitly model temporal dependencies, such as time-aware differential privacy or sequential data obfuscation strategies.

Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases

TL;DR

The work addresses privacy risks in sequential data publishing, showing that even when each release satisfies a privacy bound , temporal correlations can reveal private information. It introduces a Bi-Directional Hidden Markov Model enhanced with reinforcement learning (EPRL) to exploit sequential dependencies and infer private trajectories from published regions, using a T2P mapping and IoU-based rewards. Experimental results on Geolife, Porto Taxi, and SynMob datasets demonstrate substantial improvements over independent-release baselines, highlighting a fundamental risk and the need for temporally aware privacy frameworks. The findings motivate defense strategies such as time-aware differential privacy, trajectory-to-publication mapping randomization, and dynamic privacy budgets to mitigate cross-release inferences across domains beyond mobility data.

Abstract

Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as healthcare, finance, and mobility. While prior research has focused on protecting privacy in a single data release, many real-world systems operate under sequential or continuous data publishing, where the same or related data are released over time. Such sequential disclosures introduce new vulnerabilities, as temporal correlations across releases may enable adversaries to infer sensitive information that remains hidden in any individual release. In this paper, we investigate whether an attacker can compromise privacy in sequential data releases by exploiting dependencies between consecutive publications, even when each individual release satisfies standard privacy guarantees. To this end, we propose a novel attack model that captures these sequential dependencies by integrating a Hidden Markov Model with a reinforcement learning-based bi-directional inference mechanism. This enables the attacker to leverage both earlier and later observations in the sequence to infer private information. We instantiate our framework in the context of trajectory data, demonstrating how an adversary can recover sensitive locations from sequential mobility datasets. Extensive experiments on Geolife, Porto Taxi, and SynMob datasets show that our model consistently outperforms baseline approaches that treat each release independently. The results reveal a fundamental privacy risk inherent to sequential data publishing, where individually protected releases can collectively leak sensitive information when analyzed temporally. These findings underscore the need for new privacy-preserving frameworks that explicitly model temporal dependencies, such as time-aware differential privacy or sequential data obfuscation strategies.

Paper Structure

This paper contains 32 sections, 7 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: An illustration of sequentially published coarse-grained trajectories.
  • Figure 2: Overview of Bi-Directional HMM-RL Algorithm. $TL$ denotes the sequence of true locations, and $PR$ denotes the sequence of published regions.
  • Figure 3: Geographic Visualization of One Trajectory in Geolife with Deviation Parameter $d=2$.
  • Figure 4: Euclidean distance between the predicted and ground-truth true locations of one trajectory without EPRL across 10 passes for the Geolife dataset.
  • Figure 5: Euclidean distance between the predicted and ground-truth true locations of one trajectory with EPRL across 10 passes for the Geolife dataset.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Example 1: Motivation