Table of Contents
Fetching ...

Releasing Large-Scale Human Mobility Histograms with Differential Privacy

Christopher Bian, Albert Cheu, Yannis Guzman, Marco Gruteser, Peter Kairouz, Ryan McKenna, Edo Roth

TL;DR

The paper tackles releasing large-scale, user-level DP histograms of mobility data for EIE under strict utility requirements. It introduces an on-device activity+metric scaling mechanism that pre-scales contributions, performs centralized DP via Laplace noise, and applies post-processing to restore scale, achieving $ε\approx 2$ for the (user, week) unit across millions of histogram entries. Empirical results on a proxy dataset show this method substantially outperforms two baselines, attaining near-3% weighted relative error across metrics, thus enabling utility for emissions and policymaking analyses. The approach offers potential generalization to other group-by-sum workloads in federated privacy-preserving analytics and highlights avenues for automation and workload-aware adaptations.

Abstract

Environmental Insights Explorer (EIE) is a Google product that reports aggregate statistics about human mobility, including various methods of transit used by people across roughly 50,000 regions globally. These statistics are used to estimate carbon emissions and provided to policymakers to inform their decisions on transportation policy and infrastructure. Due to the inherent sensitivity of this type of user data, it is crucial that the statistics derived and released from it are computed with appropriate privacy protections. In this work, we use a combination of federated analytics and differential privacy to release these required statistics, while operating under strict error constraints to ensure utility for downstream stakeholders. In this work, we propose a new mechanism that achieves $ ε\approx 2 $-DP while satisfying these strict utility constraints, greatly improving over natural baselines. We believe this mechanism may be of more general interest for the broad class of group-by-sum workloads.

Releasing Large-Scale Human Mobility Histograms with Differential Privacy

TL;DR

The paper tackles releasing large-scale, user-level DP histograms of mobility data for EIE under strict utility requirements. It introduces an on-device activity+metric scaling mechanism that pre-scales contributions, performs centralized DP via Laplace noise, and applies post-processing to restore scale, achieving for the (user, week) unit across millions of histogram entries. Empirical results on a proxy dataset show this method substantially outperforms two baselines, attaining near-3% weighted relative error across metrics, thus enabling utility for emissions and policymaking analyses. The approach offers potential generalization to other group-by-sum workloads in federated privacy-preserving analytics and highlights avenues for automation and workload-aware adaptations.

Abstract

Environmental Insights Explorer (EIE) is a Google product that reports aggregate statistics about human mobility, including various methods of transit used by people across roughly 50,000 regions globally. These statistics are used to estimate carbon emissions and provided to policymakers to inform their decisions on transportation policy and infrastructure. Due to the inherent sensitivity of this type of user data, it is crucial that the statistics derived and released from it are computed with appropriate privacy protections. In this work, we use a combination of federated analytics and differential privacy to release these required statistics, while operating under strict error constraints to ensure utility for downstream stakeholders. In this work, we propose a new mechanism that achieves -DP while satisfying these strict utility constraints, greatly improving over natural baselines. We believe this mechanism may be of more general interest for the broad class of group-by-sum workloads.
Paper Structure (15 sections, 2 equations, 2 figures, 2 algorithms)

This paper contains 15 sections, 2 equations, 2 figures, 2 algorithms.

Figures (2)

  • Figure 1: A complete overview of the data collection and processing steps, including the "Activity + Metric Scaling Mechanism" (with two sample devices shown). Data starts on device, is scaled and clipped locally, before being aggregated by the Private Aggregation Service. Once aggregated on the server, we add Laplace noise centrally, and perform post-processing steps of descaling and thresholding, before releasing the private data to downstream emissions calculations and then to relevant stakeholders.
  • Figure 2: (a) overall weighted relative error of two baseline mechanisms and our mechanism for varying privacy budgets. (b) Error breakdown of each mechanism for each metric at $\epsilon=2$.