Table of Contents
Fetching ...

Peer-to-Peer Learning Dynamics of Wide Neural Networks

Shreyas Chaudhari, Srinivasa Pranav, Emile Anand, José M. F. Moura

TL;DR

The paper addresses how wide neural networks behave under peer-to-peer, privacy-preserving distributed training. It leverages Neural Tangent Kernel theory to linearize network dynamics around initialization, yielding time-invariant gradient-flow descriptions for distributed gradient methods on graphs, with closed-form solutions via a state-transition matrix $\boldsymbol{\Phi}(t,t_0)$. Key contributions include analytic, per-agent predictions for parameter and error trajectories under DGD, ATC, and CTA, along with a stability result for DGD with doubly stochastic mixing; these predictions are validated against affine and nonlinear models on various graph topologies using $L(\boldsymbol{\theta}) = \frac{1}{Q}\sum_q L_q(\boldsymbol{\theta})$ and NTK-based dynamics. The findings support practical guidance for architecture and hyperparameter tuning in beyond-5G wireless edge environments and provide a foundation for analyzing nonconvex distributed optimization in peer-to-peer settings.

Abstract

Peer-to-peer learning is an increasingly popular framework that enables beyond-5G distributed edge devices to collaboratively train deep neural networks in a privacy-preserving manner without the aid of a central server. Neural network training algorithms for emerging environments, e.g., smart cities, have many design considerations that are difficult to tune in deployment settings -- such as neural network architectures and hyperparameters. This presents a critical need for characterizing the training dynamics of distributed optimization algorithms used to train highly nonconvex neural networks in peer-to-peer learning environments. In this work, we provide an explicit characterization of the learning dynamics of wide neural networks trained using popular distributed gradient descent (DGD) algorithms. Our results leverage both recent advancements in neural tangent kernel (NTK) theory and extensive previous work on distributed learning and consensus. We validate our analytical results by accurately predicting the parameter and error dynamics of wide neural networks trained for classification tasks.

Peer-to-Peer Learning Dynamics of Wide Neural Networks

TL;DR

The paper addresses how wide neural networks behave under peer-to-peer, privacy-preserving distributed training. It leverages Neural Tangent Kernel theory to linearize network dynamics around initialization, yielding time-invariant gradient-flow descriptions for distributed gradient methods on graphs, with closed-form solutions via a state-transition matrix . Key contributions include analytic, per-agent predictions for parameter and error trajectories under DGD, ATC, and CTA, along with a stability result for DGD with doubly stochastic mixing; these predictions are validated against affine and nonlinear models on various graph topologies using and NTK-based dynamics. The findings support practical guidance for architecture and hyperparameter tuning in beyond-5G wireless edge environments and provide a foundation for analyzing nonconvex distributed optimization in peer-to-peer settings.

Abstract

Peer-to-peer learning is an increasingly popular framework that enables beyond-5G distributed edge devices to collaboratively train deep neural networks in a privacy-preserving manner without the aid of a central server. Neural network training algorithms for emerging environments, e.g., smart cities, have many design considerations that are difficult to tune in deployment settings -- such as neural network architectures and hyperparameters. This presents a critical need for characterizing the training dynamics of distributed optimization algorithms used to train highly nonconvex neural networks in peer-to-peer learning environments. In this work, we provide an explicit characterization of the learning dynamics of wide neural networks trained using popular distributed gradient descent (DGD) algorithms. Our results leverage both recent advancements in neural tangent kernel (NTK) theory and extensive previous work on distributed learning and consensus. We validate our analytical results by accurately predicting the parameter and error dynamics of wide neural networks trained for classification tasks.
Paper Structure (10 sections, 1 theorem, 15 equations, 2 figures)

This paper contains 10 sections, 1 theorem, 15 equations, 2 figures.

Key Result

Proposition 1

Let ${\mathbf W}$ be doubly stochastic. Then the gradient flow of $\boldsymbol{\vartheta}_t$ in eq:dgd_lin_gradflow is BIBO stable if the system is minimal.

Figures (2)

  • Figure 1: Loss dynamics for each agent in a complete network communication graph solving \ref{['eq:linear_exp']} with DGD (affine classifier $f$).
  • Figure 2: Loss dynamics for each agent for solving \ref{['eq:linear_exp']} with DGD and neural network classifier $f$ over (a) cycle graph (b) star graph and (c) complete graph communication networks.

Theorems & Definitions (2)

  • Proposition 1: DGD Gradient Flow Stability
  • proof