Releasing Large-Scale Human Mobility Histograms with Differential Privacy
Christopher Bian, Albert Cheu, Yannis Guzman, Marco Gruteser, Peter Kairouz, Ryan McKenna, Edo Roth
TL;DR
The paper tackles releasing large-scale, user-level DP histograms of mobility data for EIE under strict utility requirements. It introduces an on-device activity+metric scaling mechanism that pre-scales contributions, performs centralized DP via Laplace noise, and applies post-processing to restore scale, achieving $ε\approx 2$ for the (user, week) unit across millions of histogram entries. Empirical results on a proxy dataset show this method substantially outperforms two baselines, attaining near-3% weighted relative error across metrics, thus enabling utility for emissions and policymaking analyses. The approach offers potential generalization to other group-by-sum workloads in federated privacy-preserving analytics and highlights avenues for automation and workload-aware adaptations.
Abstract
Environmental Insights Explorer (EIE) is a Google product that reports aggregate statistics about human mobility, including various methods of transit used by people across roughly 50,000 regions globally. These statistics are used to estimate carbon emissions and provided to policymakers to inform their decisions on transportation policy and infrastructure. Due to the inherent sensitivity of this type of user data, it is crucial that the statistics derived and released from it are computed with appropriate privacy protections. In this work, we use a combination of federated analytics and differential privacy to release these required statistics, while operating under strict error constraints to ensure utility for downstream stakeholders. In this work, we propose a new mechanism that achieves $ ε\approx 2 $-DP while satisfying these strict utility constraints, greatly improving over natural baselines. We believe this mechanism may be of more general interest for the broad class of group-by-sum workloads.
