Deep-Relative-Trust-Based Diffusion for Decentralized Deep Learning
Muyun Li, Aaron Fainman, Stefan Vlaski
TL;DR
This work tackles decentralized training of deep networks on non-IID data by shifting from parameter-space consensus to function-space consensus using Deep Relative Trust (DRT). By formulating a penalty based on output similarity and deriving layer-wise, time-varying mixing, the authors provide convergence guarantees for the network centroid and bounded disagreement under standard stochastic-gradient assumptions. Empirical results on CIFAR-10 with ResNet-20 show that DRT diffusion improves steady-state accuracy and reduces generalization gaps on sparse topologies, while maintaining convergence speed comparable to fast-mixing diffusion. The proposed approach highlights the potential of leveraging over-parameterization to promote function-level consensus, enabling robust decentralized learning in communication-constrained or irregular networks.
Abstract
Decentralized learning strategies allow a collection of agents to learn efficiently from local data sets without the need for central aggregation or orchestration. Current decentralized learning paradigms typically rely on an averaging mechanism to encourage agreement in the parameter space. We argue that in the context of deep neural networks, which are often over-parameterized, encouraging consensus of the neural network outputs, as opposed to their parameters can be more appropriate. This motivates the development of a new decentralized learning algorithm, termed DRT diffusion, based on deep relative trust (DRT), a recently introduced similarity measure for neural networks. We provide convergence analysis for the proposed strategy, and numerically establish its benefit to generalization, especially with sparse topologies, in an image classification task.
