When Swarm Learning meets energy series data: A decentralized collaborative learning design based on blockchain
Lei Xu, Yulong Chen, Yuntian Chen, Longfeng Nie, Xuetao Wei, Liang Xue, Dongxiao Zhang
TL;DR
The paper tackles data sensitivity in energy-sector forecasting by introducing Swarm Learning, a blockchain-based decentralized collaborative learning framework that replaces a central server with a consensus-driven, on-chain parameter aggregation using smart contracts. The global weight update follows $\mathbf{w}_{e+1} \leftarrow \lambda \sum^{pK}_{k=1} \frac{N_{o,h}}{N} \mathbf{w}^{o,h}_{e+1}$, enabling secure, auditable collaboration among organizations. Evaluations on photovoltaic power generation forecasting, gas well production prediction, and geophysical well log generation show that SL consistently outperforms Local Learning and achieves competitive performance relative to Central Learning, with added privacy and security advantages over server-based approaches. The results also indicate that increasing data volume and local epochs can reduce performance variance and improve stability, while suggesting future work on off-chain storage to address blockchain-scale constraints.
Abstract
Machine learning models offer the capability to forecast future energy production or consumption and infer essential unknown variables from existing data. However, legal and policy constraints within specific energy sectors render the data sensitive, presenting technical hurdles in utilizing data from diverse sources. Therefore, we propose adopting a Swarm Learning (SL) scheme, which replaces the centralized server with a blockchain-based distributed network to address the security and privacy issues inherent in Federated Learning (FL)'s centralized architecture. Within this distributed Collaborative Learning framework, each participating organization governs nodes for inter-organizational communication. Devices from various organizations utilize smart contracts for parameter uploading and retrieval. Consensus mechanism ensures distributed consistency throughout the learning process, guarantees the transparent trustworthiness and immutability of parameters on-chain. The efficacy of the proposed framework is substantiated across three real-world energy series modeling scenarios with superior performance compared to Local Learning approaches, simultaneously emphasizing enhanced data security and privacy over Centralized Learning and FL method. Notably, as the number of data volume and the count of local epochs increases within a threshold, there is an improvement in model performance accompanied by a reduction in the variance of performance errors. Consequently, this leads to an increased stability and reliability in the outcomes produced by the model.
