Table of Contents
Fetching ...

Uniform-in-time concentration in two-layer neural networks via transportation inequalities

Arnaud Guillin, Boris Nectoux, Paul Stos

TL;DR

This work quantifies the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent and their mean-field limit, for quadratic loss and ridge regularization and proves uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1.

Abstract

We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p $\in$ {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function $Φ$. We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.

Uniform-in-time concentration in two-layer neural networks via transportation inequalities

TL;DR

This work quantifies the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent and their mean-field limit, for quadratic loss and ridge regularization and proves uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1.

Abstract

We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function . We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.
Paper Structure (17 sections, 13 theorems, 116 equations)

This paper contains 17 sections, 13 theorems, 116 equations.

Key Result

Proposition 1

Fix $p\in\{1,2\}$. Assume assump:A1--assump:A3, assump:A4p, and $L_N < 1$. Then for all $k\in\mathbb{N}$, $\nu_k \in T_p(C_N^{(p)})$ on $(\mathcal{E},\|\cdot\|_p)$, with the explicit constants

Theorems & Definitions (25)

  • Proposition 1
  • Remark
  • Corollary 1
  • Proposition 2
  • Remark
  • Theorem 1
  • Lemma 1
  • proof
  • proof : Proof of \ref{['prop:Tp']}
  • Proposition 3
  • ...and 15 more