Releasing Large-Scale Human Mobility Histograms with Differential Privacy

Christopher Bian; Albert Cheu; Yannis Guzman; Marco Gruteser; Peter Kairouz; Ryan McKenna; Edo Roth

Releasing Large-Scale Human Mobility Histograms with Differential Privacy

Christopher Bian, Albert Cheu, Yannis Guzman, Marco Gruteser, Peter Kairouz, Ryan McKenna, Edo Roth

TL;DR

The paper tackles releasing large-scale, user-level DP histograms of mobility data for EIE under strict utility requirements. It introduces an on-device activity+metric scaling mechanism that pre-scales contributions, performs centralized DP via Laplace noise, and applies post-processing to restore scale, achieving $ε\approx 2$ for the (user, week) unit across millions of histogram entries. Empirical results on a proxy dataset show this method substantially outperforms two baselines, attaining near-3% weighted relative error across metrics, thus enabling utility for emissions and policymaking analyses. The approach offers potential generalization to other group-by-sum workloads in federated privacy-preserving analytics and highlights avenues for automation and workload-aware adaptations.

Abstract

Environmental Insights Explorer (EIE) is a Google product that reports aggregate statistics about human mobility, including various methods of transit used by people across roughly 50,000 regions globally. These statistics are used to estimate carbon emissions and provided to policymakers to inform their decisions on transportation policy and infrastructure. Due to the inherent sensitivity of this type of user data, it is crucial that the statistics derived and released from it are computed with appropriate privacy protections. In this work, we use a combination of federated analytics and differential privacy to release these required statistics, while operating under strict error constraints to ensure utility for downstream stakeholders. In this work, we propose a new mechanism that achieves $ ε\approx 2 $-DP while satisfying these strict utility constraints, greatly improving over natural baselines. We believe this mechanism may be of more general interest for the broad class of group-by-sum workloads.

Releasing Large-Scale Human Mobility Histograms with Differential Privacy

TL;DR

for the (user, week) unit across millions of histogram entries. Empirical results on a proxy dataset show this method substantially outperforms two baselines, attaining near-3% weighted relative error across metrics, thus enabling utility for emissions and policymaking analyses. The approach offers potential generalization to other group-by-sum workloads in federated privacy-preserving analytics and highlights avenues for automation and workload-aware adaptations.

Abstract

-DP while satisfying these strict utility constraints, greatly improving over natural baselines. We believe this mechanism may be of more general interest for the broad class of group-by-sum workloads.

Paper Structure (15 sections, 2 equations, 2 figures, 2 algorithms)

This paper contains 15 sections, 2 equations, 2 figures, 2 algorithms.

Introduction
Problem Setup
Data
Data Limitations
Workload
Evaluation Criteria
Privacy Goals
Mechanism Design
Key Primitive: The Laplace Mechanism
Baseline Approaches
Our Approach: Activity+Metric Scaling
Empirical Evaluation
Analysis of Results
Lessons Learned and Future Work
Acknowledgments

Figures (2)

Figure 1: A complete overview of the data collection and processing steps, including the "Activity + Metric Scaling Mechanism" (with two sample devices shown). Data starts on device, is scaled and clipped locally, before being aggregated by the Private Aggregation Service. Once aggregated on the server, we add Laplace noise centrally, and perform post-processing steps of descaling and thresholding, before releasing the private data to downstream emissions calculations and then to relevant stakeholders.
Figure 2: (a) overall weighted relative error of two baseline mechanisms and our mechanism for varying privacy budgets. (b) Error breakdown of each mechanism for each metric at $\epsilon=2$.

Releasing Large-Scale Human Mobility Histograms with Differential Privacy

TL;DR

Abstract

Releasing Large-Scale Human Mobility Histograms with Differential Privacy

Authors

TL;DR

Abstract

Table of Contents

Figures (2)