Knock Knock, Who's There? Membership Inference on Aggregate Location Data
Apostolos Pyrgelis, Carmela Troncoso, Emiliano De Cristofaro
TL;DR
This work formalizes membership inference on aggregate location time-series as a distinguishability game and demonstrates that an adversary with realistic prior knowledge can accurately infer whether a target contributed to released aggregates, especially when groups are small or mobility patterns are regular. The authors instantiate the distinguishing function as ML classifiers, evaluate on two real mobility datasets (TFL and SFC), and quantify privacy loss via AUC-based metrics. They also study differential privacy defenses, showing substantial privacy gains in passive settings but notable reductions in protection when attackers mimic the defense using noisy data, all at a cost to utility. The paper provides a practical methodology for providers and regulators to assess privacy risks before data release and to compare defense strategies in real-world, continual-release settings.
Abstract
Aggregate location data is often used to support smart services and applications, e.g., generating live traffic maps or predicting visits to businesses. In this paper, we present the first study on the feasibility of membership inference attacks on aggregate location time-series. We introduce a game-based definition of the adversarial task, and cast it as a classification problem where machine learning can be used to distinguish whether or not a target user is part of the aggregates. We empirically evaluate the power of these attacks on both raw and differentially private aggregates using two mobility datasets. We find that membership inference is a serious privacy threat, and show how its effectiveness depends on the adversary's prior knowledge, the characteristics of the underlying location data, as well as the number of users and the timeframe on which aggregation is performed. Although differentially private mechanisms can indeed reduce the extent of the attacks, they also yield a significant loss in utility. Moreover, a strategic adversary mimicking the behavior of the defense mechanism can greatly limit the protection they provide. Overall, our work presents a novel methodology geared to evaluate membership inference on aggregate location data in real-world settings and can be used by providers to assess the quality of privacy protection before data release or by regulators to detect violations.
