Privacy-Aware Sequential Learning
Yuxin Liu, M. Amin Rahimian
TL;DR
This work analyzes how endogenous privacy decisions, modeled via metric differential privacy, reshape information dynamics in sequential learning. For continuous signals, a smooth randomized response strategy balances privacy and utility, yielding an accelerated learning rate of $\Theta_{\varepsilon}(\log n)$ while preserving asymptotic correctness, and ensuring finite expected times to the first correct action and finite total incorrect actions when $ε<2/σ^2$. In the binary Signal setting, privacy introduces randomization that can hinder information transmission, with cascade thresholds and correct-cascade probabilities nonmonotone in the privacy budget; nonetheless, heterogeneity in privacy budgets can recover fast aggregation, achieving $\Theta(\sqrt{n})$ under certain distributions. The paper also extends to Pufferfish privacy and demonstrates the robustness of endogenous-noise strategies across privacy frameworks. Overall, the findings reveal that carefully designed privacy mechanisms can enhance learning efficiency in sequential settings, informing platform design and policy for privacy-preserving data aggregation.
Abstract
In settings like vaccination registries, individuals act after observing others, and the resulting public records can expose private information. We study privacy-preserving sequential learning, where agents add endogenous noise to their reported actions to conceal private signals. Efficient social learning relies on information flow, seemingly in conflict with privacy. Surprisingly, with continuous signals and a fixed privacy budget $(ε)$, the optimal randomization strategy balances privacy and accuracy, accelerating learning to $Θ_ε(\log n)$, faster than the nonprivate $Θ(\sqrt{\log n})$ rate. In the nonprivate baseline, the expected time to the first correct action and the number of incorrect actions diverge; under privacy with sufficiently small $ε$, both are finite. Privacy helps because, under the false state, agents more often receive signals contradicting the majority; randomization then asymmetrically amplifies the log-likelihood ratio, enhancing aggregation. In heterogeneous populations, an order-optimal $Θ(\sqrt{n})$ rate is achievable when a subset of agents have low privacy budgets. With binary signals, however, privacy reduces informativeness and impairs learning relative to the nonprivate baseline, though the dependence on $ε$ is nonmonotone. Our results show how privacy reshapes information dynamics and inform the design of platforms and policies.
