Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

Ziyuan Bao; Eiman Kanjo; Soumya Banerjee; Hasib-Al Rashid; Tinoosh Mohsenin

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

Ziyuan Bao, Eiman Kanjo, Soumya Banerjee, Hasib-Al Rashid, Tinoosh Mohsenin

TL;DR

This work tackles the challenge of decentralised learning on resource-constrained edge devices by proposing bilayer Gossip Decentralised Parallel SGD (GD-PSGD) that combines DK-means geographic clustering with gossip-based intra- and inter-cluster model averaging and cumulative FedAvg aggregation. The method aims to overcome intermittent connectivity, limited range, and dynamic topologies while approaching the accuracy of Centralised Federated Learning (CFL). Theoretical convergence bounds for D-PSGD with gossip and for the bilayer network are derived, and empirical results on CIFAR-10 with MCUNet-in3 show that IID performance matches CFL within 1.8 extra rounds, while Non-IID performance remains robust under moderate data imbalance (less than 8% loss) and degrades gracefully under stronger skew. These findings indicate that bilayer GD-PSGD can enable scalable, privacy-preserving learning on edge devices with minimal performance trade-offs, even in challenging wireless and topological conditions.

Abstract

With the growing computational capabilities of microcontroller units (MCUs), edge devices can now support machine learning models. However, deploying decentralised federated learning (DFL) on such devices presents key challenges, including intermittent connectivity, limited communication range, and dynamic network topologies. This paper proposes a novel framework, bilayer Gossip Decentralised Parallel Stochastic Gradient Descent (GD PSGD), designed to address these issues in resource-constrained environments. The framework incorporates a hierarchical communication structure using Distributed Kmeans (DKmeans) clustering for geographic grouping and a gossip protocol for efficient model aggregation across two layers: intra-cluster and inter-cluster. We evaluate the framework's performance against the Centralised Federated Learning (CFL) baseline using the MCUNet model on the CIFAR-10 dataset under IID and Non-IID conditions. Results demonstrate that the proposed method achieves comparable accuracy to CFL on IID datasets, requiring only 1.8 additional rounds for convergence. On Non-IID datasets, the accuracy loss remains under 8\% for moderate data imbalance. These findings highlight the framework's potential to support scalable and privacy-preserving learning on edge devices with minimal performance trade-offs.

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

TL;DR

Abstract

Paper Structure (28 sections, 26 equations, 16 figures, 3 algorithms)

This paper contains 28 sections, 26 equations, 16 figures, 3 algorithms.

Introduction
Related Work
Decentralised Federated Learning Paradigms
Parallel Stochastic Gradient Descent (SGD)
Methodology
Algorithms
Convergence Framework
Convergence Rate of D-PSGD
Convergence Rate of D-PSGD with Gossip Protocol
The Overall Convergence Rate of D-PSGD with 1-to-1 Gossip Protocol on a Bilayer Network
Experiments
Dataset and Model
Experimental Setup
Results and Discussion
Performance on Non-IID Datasets
...and 13 more sections

Figures (16)

Figure 1: Comparison of Centralised Federated Learning and Decentralised Federated Learning Architectures.
Figure 2: Examples of topologies used in distributed systems: Line, Mesh, Star, and Hybrid.
Figure 3: (a) Devices before DK-means, and (b) the topology after DK-means. Black edges represent intra-cluster topology, and red edges represent inter-cluster topology. Due to limited communication range, inter-cluster topology is sparser.
Figure 4: (a) Devices exchange weights via 1-to-1 gossip protocol within clusters, and (b) inter-cluster communication. Devices prioritise neighbours not previously contacted, increasing efficiency in weight dissemination.
Figure 5: Simulation environment with 30 devices. Circles represent normal devices and triangles represent the header of the clusters and the numbers are the unique IDs of the devices. After DK-means the devices are divided into 4 clusters.
...and 11 more figures

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

TL;DR

Abstract

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (16)