Table of Contents
Fetching ...

Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

Sayan Biswas, Davide Frey, Romaric Gaudel, Nirupam Gupta, Anne-Marie Kermarrec, Dimitri Lerévérend, Rafael Pires, Rishi Sharma, François Taïani, Martijn de Vos

TL;DR

Mosaic Learning addresses decentralized training efficiency by partitioning local models into $K$ fragments and disseminating them through fragment-specific gossip, preserving total communication. It shows that the worst-case convergence rate matches the state-of-the-art $EL$ baseline and, in convex settings, larger fragmentation improves consensus by reducing the contraction factor $ ho(M_t^{\top}M_t)$. Empirically, Mosaic Learning yields up to $12$ percentage-point gains in node-level accuracy under highly heterogeneous data while maintaining EL performance in IID scenarios, validating fragmentation as a first-class primitive for DL. The work thus offers both theoretical guarantees and practical improvements, suggesting fragmentation can enhance scalability and robustness without extra communication costs.

Abstract

Decentralized learning (DL) enables collaborative machine learning (ML) without a central server, making it suitable for settings where training data cannot be centrally hosted. We introduce Mosaic Learning, a DL framework that decomposes models into fragments and disseminates them independently across the network. Fragmentation reduces redundant communication across correlated parameters and enables more diverse information propagation without increasing communication cost. We theoretically show that Mosaic Learning (i) shows state-of-the-art worst-case convergence rate, and (ii) leverages parameter correlation in an ML model, improving contraction by reducing the highest eigenvalue of a simplified system. We empirically evaluate Mosaic Learning on four learning tasks and observe up to 12 percentage points higher node-level test accuracy compared to epidemic learning (EL), a state-of-the-art baseline. In summary, Mosaic Learning improves DL performance without sacrificing its utility or efficiency, and positions itself as a new DL standard.

Mosaic Learning: A Framework for Decentralized Learning with Model Fragmentation

TL;DR

Mosaic Learning addresses decentralized training efficiency by partitioning local models into fragments and disseminating them through fragment-specific gossip, preserving total communication. It shows that the worst-case convergence rate matches the state-of-the-art baseline and, in convex settings, larger fragmentation improves consensus by reducing the contraction factor . Empirically, Mosaic Learning yields up to percentage-point gains in node-level accuracy under highly heterogeneous data while maintaining EL performance in IID scenarios, validating fragmentation as a first-class primitive for DL. The work thus offers both theoretical guarantees and practical improvements, suggesting fragmentation can enhance scalability and robustness without extra communication costs.

Abstract

Decentralized learning (DL) enables collaborative machine learning (ML) without a central server, making it suitable for settings where training data cannot be centrally hosted. We introduce Mosaic Learning, a DL framework that decomposes models into fragments and disseminates them independently across the network. Fragmentation reduces redundant communication across correlated parameters and enables more diverse information propagation without increasing communication cost. We theoretically show that Mosaic Learning (i) shows state-of-the-art worst-case convergence rate, and (ii) leverages parameter correlation in an ML model, improving contraction by reducing the highest eigenvalue of a simplified system. We empirically evaluate Mosaic Learning on four learning tasks and observe up to 12 percentage points higher node-level test accuracy compared to epidemic learning (EL), a state-of-the-art baseline. In summary, Mosaic Learning improves DL performance without sacrificing its utility or efficiency, and positions itself as a new DL standard.
Paper Structure (25 sections, 11 theorems, 34 equations, 13 figures, 1 algorithm)

This paper contains 25 sections, 11 theorems, 34 equations, 13 figures, 1 algorithm.

Key Result

Theorem 1

Consider the typical assumptions of . Consider Mosaic Learning as described in alg:system with $K$ fragments. Then, with an appropriate choice of stepsize $\eta_{}$ and after $T$ iterations, it holds that $\frac{1}{nT} \sum_{i=1}^{n}\sum_{t=0}^{T-1}\mathbb{E}\left[\norm{\nabla F\left(x_t^{(i)}\right)}^2\right] \leq \epsilon$ as long a

Figures (13)

  • Figure 1: Sending and receiving models in standard DL (left) and Mosaic Learning (right).
  • Figure 2: Eigenvalues of $M^{\top}M$ as a function of $K$ for two examples with $n=50$ nodes and $d=16$ parameters. Increasing $K$ decreases the contraction factor, improving consensus.
  • Figure 3: Consensus distance $\norm{X_t - \bar{X}_t}^2$ as a function of the number of fragments $K$ for two examples with $n=50$ nodes and $d=16$ parameters. Increasing $K$ improves consensus speed.
  • Figure 4: Performance of Mosaic Learning across iterations and number of fragments ($K$) for CIFAR-10 (top), CIFAR-100 (middle), and MovieLens (bottom), showing node-average (left) and model-average (right) test accuracies.
  • Figure 5: Consensus distance (left) and standard deviation of node performance (right) across iterations and $K$, for CIFAR-10 (top), CIFAR-100 (middle), and MovieLens (bottom), on a network with degree 8.
  • ...and 8 more figures

Theorems & Definitions (17)

  • Remark 1
  • Theorem 1: Convergence of
  • Remark 2
  • Remark 3
  • Lemma 1: Consensus error evolution
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 7 more