Table of Contents
Fetching ...

Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks

Benedikt Ströbl, Alexandra Kapp

TL;DR

This paper demonstrates that GPS trip datasets lacking explicit user IDs remain vulnerable to re-identification through a novel trajectory-user linking attack tailored to single trips. The attack combines trip concatenation, home-location assignment, and TF-IDF-based visitation pattern matching to cluster trips by inferred users, and is evaluated against two real-world datasets (freemove and GeoLife). Results show substantial re-identification risk for a meaningful fraction of users, with truncation-based obfuscation providing unreliable protection across datasets. The work establishes a practical baseline for mobility privacy assessments and highlights the need for holistic privacy safeguards beyond merely removing identifiers.

Abstract

Open human mobility data is considered an essential basis for the profound research and analysis required for the transition to sustainable mobility and sustainable urban planning. Cycling data has especially been the focus of data collection endeavors in recent years. Although privacy risks regarding location data are widely known, practitioners often refrain from advanced privacy mechanisms to prevent utility losses. Removing user identifiers from trips is thereby deemed a major privacy gain, as it supposedly prevents linking single trips to obtain entire movement patterns. In this paper, we propose a novel attack to reconstruct user identifiers in GPS trip datasets consisting of single trips, unlike previous ones that are dedicated to evaluating trajectory-user linking in the context of check-in data. We evaluate the remaining privacy risk for users in such datasets and our empirical findings from two real-world datasets show that the risk of re-identification is significant even when personal identifiers have been removed, and that truncation as a simple additional privacy mechanism may not be effective in protecting user privacy. Further investigations indicate that users who frequently visit locations that are only visited by a small number of others, tend to be more vulnerable to re-identification.

Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks

TL;DR

This paper demonstrates that GPS trip datasets lacking explicit user IDs remain vulnerable to re-identification through a novel trajectory-user linking attack tailored to single trips. The attack combines trip concatenation, home-location assignment, and TF-IDF-based visitation pattern matching to cluster trips by inferred users, and is evaluated against two real-world datasets (freemove and GeoLife). Results show substantial re-identification risk for a meaningful fraction of users, with truncation-based obfuscation providing unreliable protection across datasets. The work establishes a practical baseline for mobility privacy assessments and highlights the need for holistic privacy safeguards beyond merely removing identifiers.

Abstract

Open human mobility data is considered an essential basis for the profound research and analysis required for the transition to sustainable mobility and sustainable urban planning. Cycling data has especially been the focus of data collection endeavors in recent years. Although privacy risks regarding location data are widely known, practitioners often refrain from advanced privacy mechanisms to prevent utility losses. Removing user identifiers from trips is thereby deemed a major privacy gain, as it supposedly prevents linking single trips to obtain entire movement patterns. In this paper, we propose a novel attack to reconstruct user identifiers in GPS trip datasets consisting of single trips, unlike previous ones that are dedicated to evaluating trajectory-user linking in the context of check-in data. We evaluate the remaining privacy risk for users in such datasets and our empirical findings from two real-world datasets show that the risk of re-identification is significant even when personal identifiers have been removed, and that truncation as a simple additional privacy mechanism may not be effective in protecting user privacy. Further investigations indicate that users who frequently visit locations that are only visited by a small number of others, tend to be more vulnerable to re-identification.

Paper Structure

This paper contains 25 sections, 4 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Example illustration of LCSS metric for a pair of trips. (A) being the original trip and (B) with the upper trip reverted. The LCSS value in (A) is 0.5 since 50% of the points of the shorter trip are within a distance threshold $LCSS_{\varepsilon}$ from their counterpart in the longer trip. Analogously, in (B) the LCSS results in 0.25 because when comparing the points in reverted order -- the direction of travel for one of the trips goes in the opposite direction, only 25% of the points are within $LCSS_{\varepsilon}$. Note here that for easier readability the red lines indicating points that are further apart than the specified threshold, are only plotted exemplary for two of the above point pairs.
  • Figure 2: Flowchart explaining home location assignment procedure for concatenated trips
  • Figure 3: Obfuscation technique applied to conceal sensitive SPs and EPs of trips. Radius for truncation is drawn from uniform distribution $U\{a,b\}$ with $a = 100m$ and $b = 300m$.
  • Figure 4: Attack evaluation procedure with information of four random points known to the attacker. In this case, the four known points belong to three different clusters. The attacker thus assumes the corresponding 8 trips to belong to User 1, though only 6 are correct (true positives), resulting in a precision of $\frac{6}{8}=0.75$. One trajectory of User 1 is missed (false negatives), thus resulting in a recall of $\frac{6}{7}=0.86$.
  • Figure 5: Incremental performance across individual heuristics of attack with respect to (A) AMI and (B) ARI.
  • ...and 10 more figures