Table of Contents
Fetching ...

On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks

Xiaoguang Li, Zitao Li, Ninghui Li, Wenhai Sun

TL;DR

This work addresses the vulnerability of local differential privacy (LDP) protocols for numerical attributes to data-poisoning attacks. It introduces an attack-driven robustness framework and two high-signal metrics, Absolute Shift Gain (ASG) and Shift Gain Ratio (SGR), to enable fair cross-protocol robustness comparisons across CFO-based and distribution-reconstruction mechanisms. Through extensive experiments on real and synthetic data, the authors show that CFOs in the Server setting and the SW distribution-reconstruction method offer stronger resistance to manipulation, while hash-domain size and post-processing influence security beyond traditional privacy-utility trade-offs. A zero-shot attack-detection method leveraging reconstructed distributions and a KS-test-based hypothesis framework significantly improves detection over prior work, enabling practical defense in hostile environments. The study provides concrete guidance for designing attack-resilient LDP systems and highlights avenues for future work, including robust post-processing, optimal parameter tuning (e.g., hash domain size), and shuffler-enhanced privacy-utility trade-offs.

Abstract

Recent studies reveal that local differential privacy (LDP) protocols are vulnerable to data poisoning attacks where an attacker can manipulate the final estimate on the server by leveraging the characteristics of LDP and sending carefully crafted data from a small fraction of controlled local clients. This vulnerability raises concerns regarding the robustness and reliability of LDP in hostile environments. In this paper, we conduct a systematic investigation of the robustness of state-of-the-art LDP protocols for numerical attributes, i.e., categorical frequency oracles (CFOs) with binning and consistency, and distribution reconstruction. We evaluate protocol robustness through an attack-driven approach and propose new metrics for cross-protocol attack gain measurement. The results indicate that Square Wave and CFO-based protocols in the Server setting are more robust against the attack compared to the CFO-based protocols in the User setting. Our evaluation also unfolds new relationships between LDP security and its inherent design choices. We found that the hash domain size in local-hashing-based LDP has a profound impact on protocol robustness beyond the well-known effect on utility. Further, we propose a zero-shot attack detection by leveraging the rich reconstructed distribution information. The experiment show that our detection significantly improves the existing methods and effectively identifies data manipulation in challenging scenarios.

On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks

TL;DR

This work addresses the vulnerability of local differential privacy (LDP) protocols for numerical attributes to data-poisoning attacks. It introduces an attack-driven robustness framework and two high-signal metrics, Absolute Shift Gain (ASG) and Shift Gain Ratio (SGR), to enable fair cross-protocol robustness comparisons across CFO-based and distribution-reconstruction mechanisms. Through extensive experiments on real and synthetic data, the authors show that CFOs in the Server setting and the SW distribution-reconstruction method offer stronger resistance to manipulation, while hash-domain size and post-processing influence security beyond traditional privacy-utility trade-offs. A zero-shot attack-detection method leveraging reconstructed distributions and a KS-test-based hypothesis framework significantly improves detection over prior work, enabling practical defense in hostile environments. The study provides concrete guidance for designing attack-resilient LDP systems and highlights avenues for future work, including robust post-processing, optimal parameter tuning (e.g., hash domain size), and shuffler-enhanced privacy-utility trade-offs.

Abstract

Recent studies reveal that local differential privacy (LDP) protocols are vulnerable to data poisoning attacks where an attacker can manipulate the final estimate on the server by leveraging the characteristics of LDP and sending carefully crafted data from a small fraction of controlled local clients. This vulnerability raises concerns regarding the robustness and reliability of LDP in hostile environments. In this paper, we conduct a systematic investigation of the robustness of state-of-the-art LDP protocols for numerical attributes, i.e., categorical frequency oracles (CFOs) with binning and consistency, and distribution reconstruction. We evaluate protocol robustness through an attack-driven approach and propose new metrics for cross-protocol attack gain measurement. The results indicate that Square Wave and CFO-based protocols in the Server setting are more robust against the attack compared to the CFO-based protocols in the User setting. Our evaluation also unfolds new relationships between LDP security and its inherent design choices. We found that the hash domain size in local-hashing-based LDP has a profound impact on protocol robustness beyond the well-known effect on utility. Further, we propose a zero-shot attack detection by leveraging the rich reconstructed distribution information. The experiment show that our detection significantly improves the existing methods and effectively identifies data manipulation in challenging scenarios.
Paper Structure (34 sections, 1 theorem, 8 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 1 theorem, 8 equations, 9 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

For local-hashing-based CFOs with fixed $\epsilon$, the expected $\mathsf{ASG}$ becomes lower (higher) when the hash domain $g$ is smaller (larger) before post-processing.

Figures (9)

  • Figure 1: Examples of $\mathsf{ASG}$. The difference between the cumulative functions of the original and shifted distributions is positive in the green area and negative in the red area.
  • Figure 2: Attack results with varying $\epsilon$ from $0.1$ to $4$. Each row corresponds to one dataset. The left two columns show $\mathsf{ASG}$ and $\mathsf{SGR}$ with $\beta = 1\%$ and the right two columns depict $\mathsf{ASG}$ and $\mathsf{SGR}$ with $\beta = 5\%$.
  • Figure 3: Attack results with varying $\beta$ from $1\%$ to $7.5\%$. Each row corresponds to one dataset. The left two columns show $\mathsf{ASG}$ and $\mathsf{SGR}$ with $\epsilon = 0.2$ and the right two columns depict $\mathsf{ASG}$ and $\mathsf{SGR}$ with $\epsilon = 0.6$.
  • Figure 4: Relationship between attack efficacy and hash domain size $g$.
  • Figure 5: Attack results on SW mechanism on dataset $\mathcal{N}(0, 10)$, varying $\epsilon$ from $0.1$ to $4$. The fake values are injected into 1) the right-most bin, 2) range $[1+\frac{2b}{3}, 1+b]$, 3) range $[1, 1+b]$ and 4) range $[1-b, 1+b]$.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1: $\epsilon$-Local Differential Privacy duchi2013local
  • Theorem 1
  • proof
  • proof