Peer-to-Peer Learning Dynamics of Wide Neural Networks
Shreyas Chaudhari, Srinivasa Pranav, Emile Anand, José M. F. Moura
TL;DR
The paper addresses how wide neural networks behave under peer-to-peer, privacy-preserving distributed training. It leverages Neural Tangent Kernel theory to linearize network dynamics around initialization, yielding time-invariant gradient-flow descriptions for distributed gradient methods on graphs, with closed-form solutions via a state-transition matrix $\boldsymbol{\Phi}(t,t_0)$. Key contributions include analytic, per-agent predictions for parameter and error trajectories under DGD, ATC, and CTA, along with a stability result for DGD with doubly stochastic mixing; these predictions are validated against affine and nonlinear models on various graph topologies using $L(\boldsymbol{\theta}) = \frac{1}{Q}\sum_q L_q(\boldsymbol{\theta})$ and NTK-based dynamics. The findings support practical guidance for architecture and hyperparameter tuning in beyond-5G wireless edge environments and provide a foundation for analyzing nonconvex distributed optimization in peer-to-peer settings.
Abstract
Peer-to-peer learning is an increasingly popular framework that enables beyond-5G distributed edge devices to collaboratively train deep neural networks in a privacy-preserving manner without the aid of a central server. Neural network training algorithms for emerging environments, e.g., smart cities, have many design considerations that are difficult to tune in deployment settings -- such as neural network architectures and hyperparameters. This presents a critical need for characterizing the training dynamics of distributed optimization algorithms used to train highly nonconvex neural networks in peer-to-peer learning environments. In this work, we provide an explicit characterization of the learning dynamics of wide neural networks trained using popular distributed gradient descent (DGD) algorithms. Our results leverage both recent advancements in neural tangent kernel (NTK) theory and extensive previous work on distributed learning and consensus. We validate our analytical results by accurately predicting the parameter and error dynamics of wide neural networks trained for classification tasks.
