A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

Vincent Guan; Florent Guépin; Ana-Maria Cretu; Yves-Alexandre de Montjoye

A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

Vincent Guan, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye

TL;DR

This work develops the first Zero Auxiliary Knowledge (ZK) MIA on aggregate location data, which eliminates the need for an auxiliary dataset of real individual traces, and shows that ZK MIA remains highly effective even when the adversary only knows a small fraction of their target's location history.

Abstract

Location data is frequently collected from populations and shared in aggregate form to guide policy and decision making. However, the prevalence of aggregated data also raises the privacy concern of membership inference attacks (MIAs). MIAs infer whether an individual's data contributed to the aggregate release. Although effective MIAs have been developed for aggregate location data, these require access to an extensive auxiliary dataset of individual traces over the same locations, which are collected from a similar population. This assumption is often impractical given common privacy practices surrounding location data. To measure the risk of an MIA performed by a realistic adversary, we develop the first Zero Auxiliary Knowledge (ZK) MIA on aggregate location data, which eliminates the need for an auxiliary dataset of real individual traces. Instead, we develop a novel synthetic approach, such that suitable synthetic traces are generated from the released aggregate. We also develop methods to correct for bias and noise, to show that our synthetic-based attack is still applicable when privacy mechanisms are applied prior to release. Using two large-scale location datasets, we demonstrate that our ZK MIA matches the state-of-the-art Knock-Knock (KK) MIA across a wide range of settings, including popular implementations of differential privacy (DP) and suppression of small counts. Furthermore, we show that ZK MIA remains highly effective even when the adversary only knows a small fraction (10%) of their target's location history. This demonstrates that effective MIAs can be performed by realistic adversaries, highlighting the need for strong DP protection.

A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

TL;DR

Abstract

Paper Structure (40 sections, 3 theorems, 15 equations, 17 figures, 11 tables, 4 algorithms)

This paper contains 40 sections, 3 theorems, 15 equations, 17 figures, 11 tables, 4 algorithms.

Introduction
Definitions and Threat Model
Location Traces and Aggregates
Privacy Measures on Location Aggregates
Differential Privacy
Suppression of Small Counts (SSC)
Problem Formulation
Membership Classifier
Threat Model
Related Work
Methodology
Zero Auxiliary Knowledge MIA Framework
Generating Synthetic Traces from Aggregate Location Data
Obtaining Accurate Marginals
Estimating Space and Time Marginals
...and 25 more sections

Key Result

Lemma A.1

Given a fixed geographic region in which location data is collected,

Figures (17)

Figure 1: Adversary's prior knowledge in the previous work, Knock-Knock MIA pyrgelis2017knock (left), and our work, Zero Auxiliary Knowledge MIA (right). The ZK adversary does not require knowledge of location traces of real people to run the MIA.
Figure 2: Example of how suppression of small counts and differential privacy may be applied to an aggregate with 3 ROIs (rows) and 3 epochs (columns).
Figure 3: ZK MIA architecture: $Adv$ first creates synthetic traces, then uses them with the partial target trace to train the membership classifier, before predicting membership.
Figure 4: Log compression for SSC aggregates: SSC biases the estimate obtained from the aggregate by creating more extreme values. The true time marginal from the CDR dataset (plotted for the first week) is better approximated after the empirical estimate from the aggregate ($m=1000$, $k=1$) undergoes log compression $\log(1+\gamma x)$ with $\gamma$ chosen as in (8).
Figure 5: Power transformation for DP aggregates: DP noise compresses the estimate obtained from the aggregate. The true space marginal from the CDR dataset (organized by popularity) is better approximated after the empirical estimate from the aggregate ($m=1000$, $\frac{\Delta}{\varepsilon} = 1$) undergoes power transformation $x^p$ with $p$ selected according to Algorithm \ref{['alg:p_selection']}.
...and 12 more figures

Theorems & Definitions (6)

Definition 1: $\varepsilon$-DP dwork2006calibrating
Definition 2
Definition 3
Lemma A.1
Theorem A.2
Lemma A.3

A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

TL;DR

Abstract

A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (6)