Table of Contents
Fetching ...

Privacy-Aware Sequential Learning

Yuxin Liu, M. Amin Rahimian

TL;DR

This work analyzes how endogenous privacy decisions, modeled via metric differential privacy, reshape information dynamics in sequential learning. For continuous signals, a smooth randomized response strategy balances privacy and utility, yielding an accelerated learning rate of $\Theta_{\varepsilon}(\log n)$ while preserving asymptotic correctness, and ensuring finite expected times to the first correct action and finite total incorrect actions when $ε<2/σ^2$. In the binary Signal setting, privacy introduces randomization that can hinder information transmission, with cascade thresholds and correct-cascade probabilities nonmonotone in the privacy budget; nonetheless, heterogeneity in privacy budgets can recover fast aggregation, achieving $\Theta(\sqrt{n})$ under certain distributions. The paper also extends to Pufferfish privacy and demonstrates the robustness of endogenous-noise strategies across privacy frameworks. Overall, the findings reveal that carefully designed privacy mechanisms can enhance learning efficiency in sequential settings, informing platform design and policy for privacy-preserving data aggregation.

Abstract

In settings like vaccination registries, individuals act after observing others, and the resulting public records can expose private information. We study privacy-preserving sequential learning, where agents add endogenous noise to their reported actions to conceal private signals. Efficient social learning relies on information flow, seemingly in conflict with privacy. Surprisingly, with continuous signals and a fixed privacy budget $(ε)$, the optimal randomization strategy balances privacy and accuracy, accelerating learning to $Θ_ε(\log n)$, faster than the nonprivate $Θ(\sqrt{\log n})$ rate. In the nonprivate baseline, the expected time to the first correct action and the number of incorrect actions diverge; under privacy with sufficiently small $ε$, both are finite. Privacy helps because, under the false state, agents more often receive signals contradicting the majority; randomization then asymmetrically amplifies the log-likelihood ratio, enhancing aggregation. In heterogeneous populations, an order-optimal $Θ(\sqrt{n})$ rate is achievable when a subset of agents have low privacy budgets. With binary signals, however, privacy reduces informativeness and impairs learning relative to the nonprivate baseline, though the dependence on $ε$ is nonmonotone. Our results show how privacy reshapes information dynamics and inform the design of platforms and policies.

Privacy-Aware Sequential Learning

TL;DR

This work analyzes how endogenous privacy decisions, modeled via metric differential privacy, reshape information dynamics in sequential learning. For continuous signals, a smooth randomized response strategy balances privacy and utility, yielding an accelerated learning rate of while preserving asymptotic correctness, and ensuring finite expected times to the first correct action and finite total incorrect actions when . In the binary Signal setting, privacy introduces randomization that can hinder information transmission, with cascade thresholds and correct-cascade probabilities nonmonotone in the privacy budget; nonetheless, heterogeneity in privacy budgets can recover fast aggregation, achieving under certain distributions. The paper also extends to Pufferfish privacy and demonstrates the robustness of endogenous-noise strategies across privacy frameworks. Overall, the findings reveal that carefully designed privacy mechanisms can enhance learning efficiency in sequential settings, informing platform design and policy for privacy-preserving data aggregation.

Abstract

In settings like vaccination registries, individuals act after observing others, and the resulting public records can expose private information. We study privacy-preserving sequential learning, where agents add endogenous noise to their reported actions to conceal private signals. Efficient social learning relies on information flow, seemingly in conflict with privacy. Surprisingly, with continuous signals and a fixed privacy budget , the optimal randomization strategy balances privacy and accuracy, accelerating learning to , faster than the nonprivate rate. In the nonprivate baseline, the expected time to the first correct action and the number of incorrect actions diverge; under privacy with sufficiently small , both are finite. Privacy helps because, under the false state, agents more often receive signals contradicting the majority; randomization then asymmetrically amplifies the log-likelihood ratio, enhancing aggregation. In heterogeneous populations, an order-optimal rate is achievable when a subset of agents have low privacy budgets. With binary signals, however, privacy reduces informativeness and impairs learning relative to the nonprivate baseline, though the dependence on is nonmonotone. Our results show how privacy reshapes information dynamics and inform the design of platforms and policies.

Paper Structure

This paper contains 27 sections, 28 theorems, 216 equations, 6 figures, 1 table.

Key Result

Theorem 3.4

Before the occurrence of an information cascade, the optimal reporting strategy for each agent takes the form of a randomized response, with probability $u_n = u(\varepsilon) = \frac{1}{1 + e^{\varepsilon}}$. Once an information cascade occurs, agents report their actions truthfully.

Figures (6)

  • Figure 1: Probability of correct cascade vs. privacy budget ($\varepsilon$) for different cascade thresholds ($k$). Each colored line represents the probability of a correct cascade for a specific threshold $k$.
  • Figure 2: Probability of action $+1$ given signal $s_n$ (blue), and signal distribution under $\theta = +1$ (orange).
  • Figure 3: Smooth randomized response and the signal likelihoods. (a) shows the signal distribution under two different states ($\theta = -1$ in blue and $\theta = +1$ in orange), with the shaded area representing the probability of choosing $a_{n} = +1$ in each case. The vertical dotted line indicates the threshold location. (b) shows the smooth randomized response function as it varies with $s_{n}$ (black). (c) shows the probability of action changing from $+1$ to $-1$ under the influence of smooth randomized response, represented by the shaded area.
  • Figure 4: The value of $C(\varepsilon)^{-\frac{2}{\varepsilon\sigma^2}} \sum_{n=1}^{\infty} n^{-\frac{2}{\varepsilon\sigma^2}}$ as a function of the privacy budget $\varepsilon$ ($\sigma=\sqrt{2}$).
  • Figure 5: Log-likelihood ratio dynamics under five privacy regimes. Homogeneous settings: conservatives $(\varepsilon=0.1)$, pragmatists $(\varepsilon=0.5)$, and liberals $(\varepsilon=1)$; heterogeneous setting: $\varepsilon \sim U(0,1)$; and a non-private baseline. Signals are normally distributed with $\sigma=1$.
  • ...and 1 more figures

Theorems & Definitions (44)

  • Definition 2.1: $(\varepsilon, d_\mathscr{X})$-mDP for the Sequential Learning Model
  • Definition 3.1: $\varepsilon$-Local Differential Privacy for the Binary Model
  • Definition 3.2: Randomized Response Strategy
  • Definition 3.3: Information Cascade
  • Theorem 3.4: Randomized Response Strategy for the Binary Model
  • Definition 3.5: Information Cascade Threshold $(k)$
  • Theorem 3.6: Probability of the Correct Cascade
  • Theorem 3.7: Probability of Correct Cascade under Heterogeneous Privacy Budget
  • Definition 4.1: Asymptotic Learning
  • Definition 4.2: $(\varepsilon, d_{\mathbb{R}})$-mDP for the Continuous Model
  • ...and 34 more