Table of Contents
Fetching ...

Data Poisoning Attacks to Locally Differentially Private Range Query Protocols

Ting-Wei Liao, Chih-Hsun Lin, Yu-Lin Tsai, Takao Murakami, Chia-Mu Yu, Jun Sakuma, Chun-Ying Huang, Hiroaki Kikuchi

TL;DR

This work investigates data poisoning attacks on locally differentially private (LDP) range query protocols, revealing that standard post-processing steps like Norm-Sub can massively amplify attacker influence. The authors develop two provably optimal attacks, AoT for tree-based AHEAD and AoG for grid-based HDG, that expertly craft fake user data to maximize the target query response while maintaining stealth. They also propose defenses and adaptive attacks to evade detection, and validate their claims through theory and extensive experiments on synthetic and real-world datasets, showing attackers can achieve 5–10x influence with a small fraction of compromised users. The findings highlight a significant vulnerability in current LDP range query protocols and underscore the need for robust defenses and redesigns that balance privacy, utility, and security in decentralized data collection.

Abstract

Local Differential Privacy (LDP) has been widely adopted to protect user privacy in decentralized data collection. However, recent studies have revealed that LDP protocols are vulnerable to data poisoning attacks, where malicious users manipulate their reported data to distort aggregated results. In this work, we present the first study on data poisoning attacks targeting LDP range query protocols, focusing on both tree-based and grid-based approaches. We identify three key challenges in executing such attacks, including crafting consistent and effective fake data, maintaining data consistency across levels or grids, and preventing server detection. To address the first two challenges, we propose novel attack methods that are provably optimal, including a tree-based attack and a grid-based attack, designed to manipulate range query results with high effectiveness. \textbf{Our key finding is that the common post-processing procedure, Norm-Sub, in LDP range query protocols can help the attacker massively amplify their attack effectiveness.} In addition, we study a potential countermeasure, but also propose an adaptive attack capable of evading this defense to address the third challenge. We evaluate our methods through theoretical analysis and extensive experiments on synthetic and real-world datasets. Our results show that the proposed attacks can significantly amplify estimations for arbitrary range queries by manipulating a small fraction of users, providing 5-10x more influence than a normal user to the estimation.

Data Poisoning Attacks to Locally Differentially Private Range Query Protocols

TL;DR

This work investigates data poisoning attacks on locally differentially private (LDP) range query protocols, revealing that standard post-processing steps like Norm-Sub can massively amplify attacker influence. The authors develop two provably optimal attacks, AoT for tree-based AHEAD and AoG for grid-based HDG, that expertly craft fake user data to maximize the target query response while maintaining stealth. They also propose defenses and adaptive attacks to evade detection, and validate their claims through theory and extensive experiments on synthetic and real-world datasets, showing attackers can achieve 5–10x influence with a small fraction of compromised users. The findings highlight a significant vulnerability in current LDP range query protocols and underscore the need for robust defenses and redesigns that balance privacy, utility, and security in decentralized data collection.

Abstract

Local Differential Privacy (LDP) has been widely adopted to protect user privacy in decentralized data collection. However, recent studies have revealed that LDP protocols are vulnerable to data poisoning attacks, where malicious users manipulate their reported data to distort aggregated results. In this work, we present the first study on data poisoning attacks targeting LDP range query protocols, focusing on both tree-based and grid-based approaches. We identify three key challenges in executing such attacks, including crafting consistent and effective fake data, maintaining data consistency across levels or grids, and preventing server detection. To address the first two challenges, we propose novel attack methods that are provably optimal, including a tree-based attack and a grid-based attack, designed to manipulate range query results with high effectiveness. \textbf{Our key finding is that the common post-processing procedure, Norm-Sub, in LDP range query protocols can help the attacker massively amplify their attack effectiveness.} In addition, we study a potential countermeasure, but also propose an adaptive attack capable of evading this defense to address the third challenge. We evaluate our methods through theoretical analysis and extensive experiments on synthetic and real-world datasets. Our results show that the proposed attacks can significantly amplify estimations for arbitrary range queries by manipulating a small fraction of users, providing 5-10x more influence than a normal user to the estimation.

Paper Structure

This paper contains 60 sections, 3 theorems, 27 equations, 25 figures, 1 table, 10 algorithms.

Key Result

Theorem 1

Let $L$ be a set of nodes, and let $c_v$ be the tree coefficient for layer $L$. Arrange the nodes into $v_1, v_2, \dots, v_{|L|}$ so that $c_{v_1} \ge c_{v_2} \ge \dots \ge c_{v_{|L|}}$. Define an assignment $A = \{a_i\}_{i=1}^{|L|}$ with $0 \le a_i \le M_L$. There is an optimal assignment $A^{\text where $0 \le c < M_L$. We call the form potential optimal form.

Figures (25)

  • Figure 1: Norm-Sub
  • Figure 2: Poisoned Norm-Sub
  • Figure 3: AHEAD
  • Figure 4: AoT procedure
  • Figure 5: The figures depict two 2-D grids with attributes $(A_i, A_j)$ and $(A_j, A_k)$. Each color represents a distinct hash key, and every cell is assigned a color according to the hash function shown in the top-right corner. Given $w_2 \leq 4$, both the Inclusive and Size constraints are satisfied by using the hash pair $(h_1, \text{$\blacksquare$})$ in the left grid and $(h_2, \text{$\blacksquare$})$ in the right grid. Furthermore, the column constraint is met because each corresponding column range contains the same number of cells $(1,1,2,0)$ in the first, second, third, and fourth columns/rows, respectively.
  • ...and 20 more figures

Theorems & Definitions (7)

  • Definition 1: Local Differential Privacy
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • proof
  • proof