Table of Contents
Fetching ...

Demystifying Trajectory Recovery From Ash: An Open-Source Evaluation and Enhancement

Nicholas D'Silva, Toran Shahi, Øyvind Timian Dokk Husveg, Adith Sanjeeve, Erik Buchholz, Salil S. Kanhere

TL;DR

This paper reproduces Xu et al.'s trajectory recovery attack on aggregated mobility data using open-source datasets GeoLife and Porto Taxi to ensure transparency and reproducibility. It introduces enhancements—including a bigram transition matrix, refined cost functions, and online, incremental linking—that yield up to ~16% higher accuracy and enable processing in streaming fashion. The results confirm privacy leakage persists under aggregation but cast doubt on the extremity of prior claims, with accuracies reaching up to 54% on GeoLife and 32% on Porto Taxi, highlighting the importance of stronger privacy protections. Overall, the work provides open-source tooling, a stronger benchmarking baseline, and guidance for future research in trajectory privacy and defense.

Abstract

Once analysed, location trajectories can provide valuable insights beneficial to various applications. However, such data is also highly sensitive, rendering them susceptible to privacy risks in the event of mismanagement, for example, revealing an individual's identity, home address, or political affiliations. Hence, ensuring that privacy is preserved for this data is a priority. One commonly taken measure to mitigate this concern is aggregation. Previous work by Xu et al. shows that trajectories are still recoverable from anonymised and aggregated datasets. However, the study lacks implementation details, obfuscating the mechanisms of the attack. Additionally, the attack was evaluated on commercial non-public datasets, rendering the results and subsequent claims unverifiable. This study reimplements the trajectory recovery attack from scratch and evaluates it on two open-source datasets, detailing the preprocessing steps and implementation. Results confirm that privacy leakage still exists despite common anonymisation and aggregation methods but also indicate that the initial accuracy claims may have been overly ambitious. We release all code as open-source to ensure the results are entirely reproducible and, therefore, verifiable. Moreover, we propose a stronger attack by designing a series of enhancements to the baseline attack. These enhancements yield higher accuracies by up to 16%, providing an improved benchmark for future research in trajectory recovery methods. Our improvements also enable online execution of the attack, allowing partial attacks on larger datasets previously considered unprocessable, thereby furthering the extent of privacy leakage. The findings emphasise the importance of using strong privacy-preserving mechanisms when releasing aggregated mobility data and not solely relying on aggregation as a means of anonymisation.

Demystifying Trajectory Recovery From Ash: An Open-Source Evaluation and Enhancement

TL;DR

This paper reproduces Xu et al.'s trajectory recovery attack on aggregated mobility data using open-source datasets GeoLife and Porto Taxi to ensure transparency and reproducibility. It introduces enhancements—including a bigram transition matrix, refined cost functions, and online, incremental linking—that yield up to ~16% higher accuracy and enable processing in streaming fashion. The results confirm privacy leakage persists under aggregation but cast doubt on the extremity of prior claims, with accuracies reaching up to 54% on GeoLife and 32% on Porto Taxi, highlighting the importance of stronger privacy protections. Overall, the work provides open-source tooling, a stronger benchmarking baseline, and guidance for future research in trajectory privacy and defense.

Abstract

Once analysed, location trajectories can provide valuable insights beneficial to various applications. However, such data is also highly sensitive, rendering them susceptible to privacy risks in the event of mismanagement, for example, revealing an individual's identity, home address, or political affiliations. Hence, ensuring that privacy is preserved for this data is a priority. One commonly taken measure to mitigate this concern is aggregation. Previous work by Xu et al. shows that trajectories are still recoverable from anonymised and aggregated datasets. However, the study lacks implementation details, obfuscating the mechanisms of the attack. Additionally, the attack was evaluated on commercial non-public datasets, rendering the results and subsequent claims unverifiable. This study reimplements the trajectory recovery attack from scratch and evaluates it on two open-source datasets, detailing the preprocessing steps and implementation. Results confirm that privacy leakage still exists despite common anonymisation and aggregation methods but also indicate that the initial accuracy claims may have been overly ambitious. We release all code as open-source to ensure the results are entirely reproducible and, therefore, verifiable. Moreover, we propose a stronger attack by designing a series of enhancements to the baseline attack. These enhancements yield higher accuracies by up to 16%, providing an improved benchmark for future research in trajectory recovery methods. Our improvements also enable online execution of the attack, allowing partial attacks on larger datasets previously considered unprocessable, thereby furthering the extent of privacy leakage. The findings emphasise the importance of using strong privacy-preserving mechanisms when releasing aggregated mobility data and not solely relying on aggregation as a means of anonymisation.
Paper Structure (17 sections, 7 equations, 6 figures, 1 table)

This paper contains 17 sections, 7 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Location grid with cell centres (red) and user locations (blue) applied to the Porto Taxi dataset.
  • Figure 2: Accuracies on the baseline (left) and enhanced (right) attacks.
  • Figure 3: Top-$k$ uniqueness values for $1 \le k \le 5$. The ground truth and outputs of the baseline and enhanced attacks are shown for each sub-dataset.
  • Figure 4: Recovery errors on the baseline and enhanced attacks.
  • Figure 5: Two examples of predicted trajectories and their matched true trajectories over a single day.
  • ...and 1 more figures