Demystifying Trajectory Recovery From Ash: An Open-Source Evaluation and Enhancement
Nicholas D'Silva, Toran Shahi, Øyvind Timian Dokk Husveg, Adith Sanjeeve, Erik Buchholz, Salil S. Kanhere
TL;DR
This paper reproduces Xu et al.'s trajectory recovery attack on aggregated mobility data using open-source datasets GeoLife and Porto Taxi to ensure transparency and reproducibility. It introduces enhancements—including a bigram transition matrix, refined cost functions, and online, incremental linking—that yield up to ~16% higher accuracy and enable processing in streaming fashion. The results confirm privacy leakage persists under aggregation but cast doubt on the extremity of prior claims, with accuracies reaching up to 54% on GeoLife and 32% on Porto Taxi, highlighting the importance of stronger privacy protections. Overall, the work provides open-source tooling, a stronger benchmarking baseline, and guidance for future research in trajectory privacy and defense.
Abstract
Once analysed, location trajectories can provide valuable insights beneficial to various applications. However, such data is also highly sensitive, rendering them susceptible to privacy risks in the event of mismanagement, for example, revealing an individual's identity, home address, or political affiliations. Hence, ensuring that privacy is preserved for this data is a priority. One commonly taken measure to mitigate this concern is aggregation. Previous work by Xu et al. shows that trajectories are still recoverable from anonymised and aggregated datasets. However, the study lacks implementation details, obfuscating the mechanisms of the attack. Additionally, the attack was evaluated on commercial non-public datasets, rendering the results and subsequent claims unverifiable. This study reimplements the trajectory recovery attack from scratch and evaluates it on two open-source datasets, detailing the preprocessing steps and implementation. Results confirm that privacy leakage still exists despite common anonymisation and aggregation methods but also indicate that the initial accuracy claims may have been overly ambitious. We release all code as open-source to ensure the results are entirely reproducible and, therefore, verifiable. Moreover, we propose a stronger attack by designing a series of enhancements to the baseline attack. These enhancements yield higher accuracies by up to 16%, providing an improved benchmark for future research in trajectory recovery methods. Our improvements also enable online execution of the attack, allowing partial attacks on larger datasets previously considered unprocessable, thereby furthering the extent of privacy leakage. The findings emphasise the importance of using strong privacy-preserving mechanisms when releasing aggregated mobility data and not solely relying on aggregation as a means of anonymisation.
