Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

Zibo Wang; Yifei Zhu; Dan Wang; Zhu Han

Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

Zibo Wang, Yifei Zhu, Dan Wang, Zhu Han

TL;DR

This work tackles the challenge of performing high-quality data analytics in Web 3.0 under strict privacy constraints and limited participant scales. It introduces FedWeb, a federated analytics framework for frequent pattern mining that uses a distributed differential privacy (DDP) scheme built on a geometric mechanism decomposed via Polya variables and secured by aggregation, ensuring $\epsilon$-DDP per data owner and $\epsilon/K$-CDP on the aggregate. The method employs a confidence-bound based candidate filtering approach using Chebyshev's and Hoeffding's inequalities to guarantee mining accuracy, and adds two budget-saving strategies—candidate padding and data-owner reuse—to dramatically reduce required data owners. Experimental results across three Web 3.0 datasets show FedWeb yields ~$25.3\%$ higher F1 scores while reducing data-owner participation by ~ $98.4\%$, illustrating strong utility with substantial privacy and scalability benefits in decentralized applications.

Abstract

The emerging Web 3.0 paradigm aims to decentralize existing web services, enabling desirable properties such as transparency, incentives, and privacy preservation. However, current Web 3.0 applications supported by blockchain infrastructure still cannot support complex data analytics tasks in a scalable and privacy-preserving way. This paper introduces the emerging federated analytics (FA) paradigm into the realm of Web 3.0 services, enabling data to stay local while still contributing to complex web analytics tasks in a privacy-preserving way. We propose FedWeb, a tailored FA design for important frequent pattern mining tasks in Web 3.0. FedWeb remarkably reduces the number of required participating data owners to support privacy-preserving Web 3.0 data analytics based on a novel distributed differential privacy technique. The correctness of mining results is guaranteed by a theoretically rigid candidate filtering scheme based on Hoeffding's inequality and Chebychev's inequality. Two response budget saving solutions are proposed to further reduce participating data owners. Experiments on three representative Web 3.0 scenarios show that FedWeb can improve data utility by ~25.3% and reduce the participating data owners by ~98.4%.

Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

TL;DR

-DDP per data owner and

-CDP on the aggregate. The method employs a confidence-bound based candidate filtering approach using Chebyshev's and Hoeffding's inequalities to guarantee mining accuracy, and adds two budget-saving strategies—candidate padding and data-owner reuse—to dramatically reduce required data owners. Experimental results across three Web 3.0 datasets show FedWeb yields ~

higher F1 scores while reducing data-owner participation by ~

, illustrating strong utility with substantial privacy and scalability benefits in decentralized applications.

Abstract

Paper Structure (15 sections, 6 theorems, 19 equations, 4 figures, 1 table, 3 algorithms)

This paper contains 15 sections, 6 theorems, 19 equations, 4 figures, 1 table, 3 algorithms.

Introduction
Preliminary: Privacy for Web 3.0
Threat Model and Problem Formulation
Privacy-Preserving FPM Design for Web 3.0
Phase 1: Candidate pattern distribution
Phase 2: DDP-based private response
Phase 3: Response analysis
Further enhancement on saving response budget
Evaluation
Experiment setting
Experiment results
Related Work
Conclusion
Proof of Theorem \ref{['theorem_ucb']}
Proof of Theorem \ref{['theorem_lcb']}

Key Result

Theorem 1

Suppose there are $n$ distributed data owners, each data owner adds $\mathcal{X}_i - \mathcal{Y}_i$ to its upload, where $\mathcal{X}_i$ and $\mathcal{Y}_i$ are i.i.d. $P\acute{o}lya(1/n,\alpha)$ variables. These variables sum up to follow two-sided geometric distribution, i.e.,

Figures (4)

Figure 1: Possible statuses of privacy preservation and data analytics in Web 3.0 applications. The marks of eye indicate the availability to learn private data. The marks of lamp indicate the availability to optimize the web service via data analytics.
Figure 2: Performance of FedWeb and the benchmarks in three datasets. The legends of RAPPOR and SFP mark their numbers of participating data owners. We mark invalid ("inv") if no frequent pattern is output from an algorithm.
Figure 3: Performance of FedWeb with different response budget saving strategies in MSNBC dataset. The candidate padding and data owner reusing strategies are abbreviated as "padding" and "reusing", respectively.
Figure 4: Performance of FedWeb under different $\epsilon$ and $K$ in MSNBC dataset.

Theorems & Definitions (12)

Definition 1: Geometric mechanism
Definition 2: Pólya distribution
Theorem 1
proof
Theorem 2
proof
Theorem 3: Confidence bound of frequent patterns
proof
Theorem 4: Confidence bound of non-frequent patterns
proof
...and 2 more

Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

TL;DR

Abstract

Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)