Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications
Zibo Wang, Yifei Zhu, Dan Wang, Zhu Han
TL;DR
This work tackles the challenge of performing high-quality data analytics in Web 3.0 under strict privacy constraints and limited participant scales. It introduces FedWeb, a federated analytics framework for frequent pattern mining that uses a distributed differential privacy (DDP) scheme built on a geometric mechanism decomposed via Polya variables and secured by aggregation, ensuring $\epsilon$-DDP per data owner and $\epsilon/K$-CDP on the aggregate. The method employs a confidence-bound based candidate filtering approach using Chebyshev's and Hoeffding's inequalities to guarantee mining accuracy, and adds two budget-saving strategies—candidate padding and data-owner reuse—to dramatically reduce required data owners. Experimental results across three Web 3.0 datasets show FedWeb yields ~$25.3\%$ higher F1 scores while reducing data-owner participation by ~ $98.4\%$, illustrating strong utility with substantial privacy and scalability benefits in decentralized applications.
Abstract
The emerging Web 3.0 paradigm aims to decentralize existing web services, enabling desirable properties such as transparency, incentives, and privacy preservation. However, current Web 3.0 applications supported by blockchain infrastructure still cannot support complex data analytics tasks in a scalable and privacy-preserving way. This paper introduces the emerging federated analytics (FA) paradigm into the realm of Web 3.0 services, enabling data to stay local while still contributing to complex web analytics tasks in a privacy-preserving way. We propose FedWeb, a tailored FA design for important frequent pattern mining tasks in Web 3.0. FedWeb remarkably reduces the number of required participating data owners to support privacy-preserving Web 3.0 data analytics based on a novel distributed differential privacy technique. The correctness of mining results is guaranteed by a theoretically rigid candidate filtering scheme based on Hoeffding's inequality and Chebychev's inequality. Two response budget saving solutions are proposed to further reduce participating data owners. Experiments on three representative Web 3.0 scenarios show that FedWeb can improve data utility by ~25.3% and reduce the participating data owners by ~98.4%.
