DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Nastaran Saadati; Minh Pham; Nasla Saleem; Joshua R. Waite; Aditya Balu; Zhanhong Jiang; Chinmay Hegde; Soumik Sarkar

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar

TL;DR

DIMAT introduces a decentralized iterative merging-and-training framework that replaces costly consensus averaging with activation-matching-based weight merging across neighbors. By integrating a model-merging operator with a carefully designed mixing matrix, it achieves convergence for nonconvex objectives and yields tighter error bounds than prior decentralized methods. Theoretical results show DIMAT attains a favorable rate with an improved spectral gap $1-\rho'$, while empirical evaluations across CIFAR-100, CIFAR-10, and Tiny ImageNet demonstrate faster initial gains and reduced communication overhead under both IID and non-IID data. This approach offers a practical, scalable path for updating large pre-trained models in distributed, resource-constrained environments.

Abstract

Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm--a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for nonconvex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new opportunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and light-weight communication and computation.

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

TL;DR

, while empirical evaluations across CIFAR-100, CIFAR-10, and Tiny ImageNet demonstrate faster initial gains and reduced communication overhead under both IID and non-IID data. This approach offers a practical, scalable path for updating large pre-trained models in distributed, resource-constrained environments.

Abstract

Paper Structure (31 sections, 14 theorems, 61 equations, 11 figures, 5 tables, 3 algorithms)

This paper contains 31 sections, 14 theorems, 61 equations, 11 figures, 5 tables, 3 algorithms.

Related Work
Methodology
Preliminaries: Activation Matching
Problem Formulation
Algorithmic Framework
Main Results
Assumptions
Convergence Analysis
Experimental Results
Comparison of Algorithms in IID Setting
Comparison of Algorithms in Non-IID Setting
Conclusions
Acknowledgements
Additional Analysis
Algorithmic Frameworks
...and 16 more sections

Key Result

Theorem 1

schacke2004kronecker Let $\mathbf{C}\in\mathbb{R}^{N\times N}$ and $\mathbf{D}\in\mathbb{R}^{d\times d}$, with eigenvalue $\lambda\in s(\mathbf{C})$ with corresponding eigenvector $x\in\mathbb{C}^{N}$, and $\mu\in s(\mathbf{D})$ with corresponding eigenvector $y\in\mathbb{C}^{d}$, where $s(\cdot)$ s

Figures (11)

Figure 1: Comparing algorithmic accuracy (mean$\pm$std) in fully connected (FC) (a) and ring (b) topologies with ResNet-20 architecture on CIFAR-100 IID data. The scalability with ResNet-20 architecture on CIFAR-100 IID data and fully connected topology is shown in (c).
Figure 2: Comparing algorithmic accuracy (mean$\pm$std) in fully connected (FC) (a) and ring (b) topologies with VGG16 architecture on CIFAR-100 IID data. The scalability with VGG16 architecture on CIFAR-100 IID data and fully connected topology is shown in (c).
Figure 3: Comparing algorithmic accuracy (mean$\pm$std) in fully connected topology with ResNet-20 (a) and VGG16 (b) architecture on CIFAR-100 non-IID data for 5 agents.
Figure 4: Scalability Analysis for DIMAT: Evaluating accuracy (mean$\pm$std) in fully connected topology with ResNet-20 (a) and VGG16 (b) on CIFAR-100 non-IID data across 5 to 10 agents.
Figure 5: Comparing algorithmic accuracy (mean$\pm$std) in fully connected (a) and ring (b) topologies with ResNet-20 architecture on CIFAR-10 IID data for 5 agents.
...and 6 more figures

Theorems & Definitions (19)

Theorem 1
Theorem 2
Remark 1
Corollary 1
Remark 2
Theorem 3
Corollary 2
Theorem 4
Lemma 1
Lemma 2
...and 9 more

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

TL;DR

Abstract

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (19)