Table of Contents
Fetching ...

Federated Heavy Hitter Analytics with Local Differential Privacy

Yuemin Zhang, Qingqing Ye, Haibo Hu

TL;DR

This work addresses federated heavy hitter analytics under local differential privacy in cross-party settings, where non-IID data and high communication costs hinder naive aggregation. It introduces TAP, a target-aligning prefix tree with a shared shallow trie and adaptive extension to align local and global targets under $\epsilon$-LDP, supplemented by a consensus-based pruning layer (TAPS) that propagates prior knowledge between parties in a sequential order. The proposed framework achieves superior utility (F1/NCR) across real and synthetic datasets and remains robust to different frequency oracles, while maintaining practical communication and computation costs. The results demonstrate a scalable, privacy-preserving approach for cross-party heavy hitter discovery with strong theoretical guarantees and broad applicability to federated analytics tasks.

Abstract

Federated heavy hitter analytics enables service providers to better understand the preferences of cross-party users by analyzing the most frequent items. As with federated learning, it faces challenges of privacy concerns, statistical heterogeneity, and expensive communication. Local differential privacy (LDP), as the de facto standard for privacy-preserving data collection, solves the privacy challenge by letting each user perturb her data locally and report the sanitized version. However, in federated settings, applying LDP complicates the other two challenges, due to the deteriorated utility by the injected LDP noise or increasing communication/computation costs by perturbation mechanism. To tackle these problems, we propose a novel target-aligning prefix tree mechanism satisfying $ε$-LDP, for federated heavy hitter analytics. In particular, we propose an adaptive extension strategy to address the inconsistencies between covering necessary prefixes and estimating heavy hitters within a party to enhance the utility. We also present a consensus-based pruning strategy that utilizes noisy prior knowledge from other parties to further align the inconsistency between finding heavy hitters in each party and providing reasonable frequency information to identify the global ones. To the best of our knowledge, our study is the first solution to the federated heavy hitter analytics in a cross-party setting while satisfying the stringent $ε$-LDP. Comprehensive experiments on both real-world and synthetic datasets confirm the effectiveness of our proposed mechanism.

Federated Heavy Hitter Analytics with Local Differential Privacy

TL;DR

This work addresses federated heavy hitter analytics under local differential privacy in cross-party settings, where non-IID data and high communication costs hinder naive aggregation. It introduces TAP, a target-aligning prefix tree with a shared shallow trie and adaptive extension to align local and global targets under -LDP, supplemented by a consensus-based pruning layer (TAPS) that propagates prior knowledge between parties in a sequential order. The proposed framework achieves superior utility (F1/NCR) across real and synthetic datasets and remains robust to different frequency oracles, while maintaining practical communication and computation costs. The results demonstrate a scalable, privacy-preserving approach for cross-party heavy hitter discovery with strong theoretical guarantees and broad applicability to federated analytics tasks.

Abstract

Federated heavy hitter analytics enables service providers to better understand the preferences of cross-party users by analyzing the most frequent items. As with federated learning, it faces challenges of privacy concerns, statistical heterogeneity, and expensive communication. Local differential privacy (LDP), as the de facto standard for privacy-preserving data collection, solves the privacy challenge by letting each user perturb her data locally and report the sanitized version. However, in federated settings, applying LDP complicates the other two challenges, due to the deteriorated utility by the injected LDP noise or increasing communication/computation costs by perturbation mechanism. To tackle these problems, we propose a novel target-aligning prefix tree mechanism satisfying -LDP, for federated heavy hitter analytics. In particular, we propose an adaptive extension strategy to address the inconsistencies between covering necessary prefixes and estimating heavy hitters within a party to enhance the utility. We also present a consensus-based pruning strategy that utilizes noisy prior knowledge from other parties to further align the inconsistency between finding heavy hitters in each party and providing reasonable frequency information to identify the global ones. To the best of our knowledge, our study is the first solution to the federated heavy hitter analytics in a cross-party setting while satisfying the stringent -LDP. Comprehensive experiments on both real-world and synthetic datasets confirm the effectiveness of our proposed mechanism.

Paper Structure

This paper contains 27 sections, 4 theorems, 12 equations, 7 figures, 8 tables, 4 algorithms.

Key Result

theorem 1

(Post-Processing) Let $\mathcal{M}:\mathcal{X}\to\mathcal{Y}$ be a mechanism that satisfies $\epsilon$-LDP, and $\mathcal{F}:\mathcal{Y}\to\mathcal{Y}^\prime$ be an arbitrary randomized mapping. Then $\mathcal{F}\circ\mathcal{M}:\mathcal{X}\to\mathcal{Y}^\prime$ satisfies $\epsilon$-LDP, where $\cir

Figures (7)

  • Figure 1: An overview of the TAP mechanism.
  • Figure 2: Toy examples of the TAP mechanism.
  • Figure 3: Consensus-based Pruning Strategy.
  • Figure 4: F1 scores vs. privacy budget $\epsilon$ under different $k$.
  • Figure 5: NCR scores vs. privacy budget $\epsilon$ under different $k$.
  • ...and 2 more figures

Theorems & Definitions (6)

  • definition 1: $\epsilon$-LDP
  • theorem 1
  • definition 2: Top-$k$ Federated Heavy Hitter
  • theorem 2
  • theorem 3
  • theorem 4