BagChain: A Dual-functional Blockchain Leveraging Bagging-based Distributed Learning
Zixiang Cui, Xintong Ling, Xingyu Zhou, Jiaheng Wang, Zhi Ding, Xiqi Gao
TL;DR
BagChain tackles privacy-preserving distributed learning on open networks by replacing meaningless PoW with bagging-based ML training and ensemble aggregation. It introduces a three-layer blockchain (MiniBlock, Ensemble Block, Key Block) and a Cross Fork Sharing mechanism to maximize base-model utilization while mitigating forking waste, complemented by a task-queue to prevent dataset leakage. ChainXim-based simulations show that BagChain ensembles consistently outperform local private-data baselines and remain robust under non-IID data and varying network conditions. The framework achieves decentralized incentive alignment and privacy preservation, with practical mechanisms for validation, security, and scalability demonstrated on diverse ML tasks.
Abstract
This work proposes a dual-functional blockchain framework named BagChain for bagging-based decentralized learning. BagChain integrates blockchain with distributed machine learning by replacing the computationally costly hash operations in proof-of-work with machine-learning model training. BagChain utilizes individual miners' private data samples and limited computing resources to train potentially weak base models, which may be very weak, and further aggregates them into strong ensemble models. Specifically, we design a three-layer blockchain structure associated with the corresponding generation and validation mechanisms to enable distributed machine learning among uncoordinated miners in a permissionless and open setting. To reduce computational waste due to blockchain forking, we further propose the cross fork sharing mechanism for practical networks with lengthy delays. Extensive experiments illustrate the superiority and efficacy of BagChain when handling various machine learning tasks on both independently and identically distributed (IID) and non-IID datasets. BagChain remains robust and effective even when facing constrained local computing capability, heterogeneous private user data, and sparse network connectivity.
