Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

Sayan Biswas; Davide Frey; Romaric Gaudel; Nirupam Gupta; Anne-Marie Kermarrec; Dimitri Lerévérend; Rafael Pires; Rishi Sharma; François Taïani; Martijn de Vos

Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

Sayan Biswas, Davide Frey, Romaric Gaudel, Nirupam Gupta, Anne-Marie Kermarrec, Dimitri Lerévérend, Rafael Pires, Rishi Sharma, François Taïani, Martijn de Vos

TL;DR

Mosaic Learning addresses decentralized training efficiency by partitioning local models into $K$ fragments and disseminating them through fragment-specific gossip, preserving total communication. It shows that the worst-case convergence rate matches the state-of-the-art $EL$ baseline and, in convex settings, larger fragmentation improves consensus by reducing the contraction factor $ ho(M_t^{\top}M_t)$. Empirically, Mosaic Learning yields up to $12$ percentage-point gains in node-level accuracy under highly heterogeneous data while maintaining EL performance in IID scenarios, validating fragmentation as a first-class primitive for DL. The work thus offers both theoretical guarantees and practical improvements, suggesting fragmentation can enhance scalability and robustness without extra communication costs.

Abstract

Decentralized learning (DL) enables collaborative machine learning (ML) without a central server, making it suitable for settings where training data cannot be centrally hosted. We introduce Mosaic Learning, a DL framework that decomposes models into fragments and disseminates them independently across the network. Fragmentation reduces redundant communication across correlated parameters and enables more diverse information propagation without increasing communication cost. We theoretically show that Mosaic Learning (i) shows state-of-the-art worst-case convergence rate, and (ii) leverages parameter correlation in an ML model, improving contraction by reducing the highest eigenvalue of a simplified system. We empirically evaluate Mosaic Learning on four learning tasks and observe up to 12 percentage points higher node-level test accuracy compared to epidemic learning (EL), a state-of-the-art baseline. In summary, Mosaic Learning improves DL performance without sacrificing its utility or efficiency, and positions itself as a new DL standard.

Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

TL;DR

Mosaic Learning addresses decentralized training efficiency by partitioning local models into

fragments and disseminating them through fragment-specific gossip, preserving total communication. It shows that the worst-case convergence rate matches the state-of-the-art

baseline and, in convex settings, larger fragmentation improves consensus by reducing the contraction factor

. Empirically, Mosaic Learning yields up to

percentage-point gains in node-level accuracy under highly heterogeneous data while maintaining EL performance in IID scenarios, validating fragmentation as a first-class primitive for DL. The work thus offers both theoretical guarantees and practical improvements, suggesting fragmentation can enhance scalability and robustness without extra communication costs.

Abstract

Paper Structure (25 sections, 11 theorems, 34 equations, 13 figures, 1 algorithm)

This paper contains 25 sections, 11 theorems, 34 equations, 13 figures, 1 algorithm.

Introduction
Background and Preliminaries
Standard decentralized learning (D-PSGD)
EL
The Mosaic Learning framework
Theoretical Analysis
General convergence
Impact of the number of fragments
Evaluation
Experimental Setup
The effect of the number of fragments
The effect of graph degree
The effect of data heterogeneity
Related Work
FL
...and 10 more sections

Key Result

Theorem 1

Consider the typical assumptions of . Consider Mosaic Learning as described in alg:system with $K$ fragments. Then, with an appropriate choice of stepsize $\eta_{}$ and after $T$ iterations, it holds that $\frac{1}{nT} \sum_{i=1}^{n}\sum_{t=0}^{T-1}\mathbb{E}\left[\norm{\nabla F\left(x_t^{(i)}\right)}^2\right] \leq \epsilon$ as long a

Figures (13)

Figure 1: Sending and receiving models in standard DL (left) and Mosaic Learning (right).
Figure 2: Eigenvalues of $M^{\top}M$ as a function of $K$ for two examples with $n=50$ nodes and $d=16$ parameters. Increasing $K$ decreases the contraction factor, improving consensus.
Figure 3: Consensus distance $\norm{X_t - \bar{X}_t}^2$ as a function of the number of fragments $K$ for two examples with $n=50$ nodes and $d=16$ parameters. Increasing $K$ improves consensus speed.
Figure 4: Performance of Mosaic Learning across iterations and number of fragments ($K$) for CIFAR-10 (top), CIFAR-100 (middle), and MovieLens (bottom), showing node-average (left) and model-average (right) test accuracies.
Figure 5: Consensus distance (left) and standard deviation of node performance (right) across iterations and $K$, for CIFAR-10 (top), CIFAR-100 (middle), and MovieLens (bottom), on a network with degree 8.
...and 8 more figures

Theorems & Definitions (17)

Remark 1
Theorem 1: Convergence of
Remark 2
Remark 3
Lemma 1: Consensus error evolution
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Lemma 6
...and 7 more

Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

TL;DR

Abstract

Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (17)