Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning

Muyun Li; Aaron Fainman; Stefan Vlaski

Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning

Muyun Li, Aaron Fainman, Stefan Vlaski

TL;DR

This work tackles decentralized training of deep networks on non-IID data by shifting from parameter-space consensus to function-space consensus using Deep Relative Trust (DRT). By formulating a penalty based on output similarity and deriving layer-wise, time-varying mixing, the authors provide convergence guarantees for the network centroid and bounded disagreement under standard stochastic-gradient assumptions. Empirical results on CIFAR-10 with ResNet-20 show that DRT diffusion improves steady-state accuracy and reduces generalization gaps on sparse topologies, while maintaining convergence speed comparable to fast-mixing diffusion. The proposed approach highlights the potential of leveraging over-parameterization to promote function-level consensus, enabling robust decentralized learning in communication-constrained or irregular networks.

Abstract

Decentralized learning strategies allow a collection of agents to learn efficiently from local data sets without the need for central aggregation or orchestration. Current decentralized learning paradigms typically rely on an averaging mechanism to encourage agreement in the parameter space. We argue that in the context of deep neural networks, which are often over-parameterized, encouraging consensus of the neural network outputs, as opposed to their parameters can be more appropriate. This motivates the development of a new decentralized learning algorithm, termed DRT diffusion, based on deep relative trust (DRT), a recently introduced similarity measure for neural networks. We provide convergence analysis for the proposed strategy, and numerically establish its benefit to generalization, especially with sparse topologies, in an image classification task.

Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning

TL;DR

Abstract

Paper Structure (13 sections, 5 theorems, 51 equations, 2 figures, 1 table)

This paper contains 13 sections, 5 theorems, 51 equations, 2 figures, 1 table.

Introduction
DRT Diffusion
Convergence Analysis
Network Centroid
Network Disagreement and Descent
Simulation Results
Simulation Setup
Results and Discussion
Steady-state Performance
Convergence Rate
Generalization Gap
Proof of Lemma \ref{['lem:disg']}
Proof of Theorem \ref{['thm:descent']}

Key Result

Lemma 1

Under Assumption assp: sc and the construction eqn:matrixCons, the graph represented by the weighted combination matrix $\bm{A}_{i}^{(p)} \triangleq \left[\bm{a}_{\ell k, i}^{(p)}\right]$ is compatible with the graph described by $C$ for all $p$ and all $i$ in the sense that: Moreover, for all $p$ and all $i$, the nonzero elements in the mixing matrices are lower bounded as follows: ∎

Figures (2)

Figure 1: Learning curves for a decentralized network with 16 agents, employing ResNet-20 on CIFAR-10 with non-IID data at each agent
Figure 2: Generalization gap for a decentralized network with 16 agents, employing ResNet-20 on CIFAR-10 with non-IID data at each agent

Theorems & Definitions (9)

Lemma 1: Graph-compatible $\bm{A}_{i}^{(p)}$
proof
Lemma 2: Time-varying weight vector Tsitsiklis84
Lemma 3: Network Disagreement
proof
Theorem 1: Descent Relation
proof
Lemma 4: Perturbation bounds
proof

Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning

TL;DR

Abstract

Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)