On Size-Independent Sample Complexity of ReLU Networks

Mark Sellke

On Size-Independent Sample Complexity of ReLU Networks

Mark Sellke

TL;DR

The paper addresses how to bound the sample complexity of learning $D$-layer ReLU networks under norm constraints via Rademacher complexity. It refines prior depth-dependent bounds by employing a subsequence-based contraction argument to achieve bounds that are often independent of network depth, formalized through ${\mathcal{R}}_n({\mathcal F}_D) \le 15 B n^{-1/2} P_F(D) \sqrt{\sum_{d=0}^{D-1} R(d)}$. The bound depends on per-layer Frobenius and operator norms via $P_F(D)$ and $R(d)=P_{op}(d)/P_F(d)$, and can be sharpened by selecting an optimal subsequence $0=d_0<d_1<\dots<d_k=D$ to yield ${\mathcal{R}}_n({\mathcal F}_D) \le 5 B n^{-1/2} P_F(D) \sum_{i=1}^k R(d_{i-1}) \sqrt{d_i-d_{i-1}}$. In typical scenarios where $R(d)$ decays rapidly (often exponentially), this leads to near depth-independence without restricting intermediate widths, offering a refined perspective on the generalization capabilities of ReLU networks under norm constraints.

Abstract

We study the sample complexity of learning ReLU neural networks from the point of view of generalization. Given norm constraints on the weight matrices, a common approach is to estimate the Rademacher complexity of the associated function class. Previously Golowich-Rakhlin-Shamir (2020) obtained a bound independent of the network size (scaling with a product of Frobenius norms) except for a factor of the square-root depth. We give a refinement which often has no explicit depth-dependence at all.

On Size-Independent Sample Complexity of ReLU Networks

TL;DR

The paper addresses how to bound the sample complexity of learning

-layer ReLU networks under norm constraints via Rademacher complexity. It refines prior depth-dependent bounds by employing a subsequence-based contraction argument to achieve bounds that are often independent of network depth, formalized through

. The bound depends on per-layer Frobenius and operator norms via

and

, and can be sharpened by selecting an optimal subsequence

to yield

. In typical scenarios where

decays rapidly (often exponentially), this leads to near depth-independence without restricting intermediate widths, offering a refined perspective on the generalization capabilities of ReLU networks under norm constraints.

Abstract

Paper Structure (4 sections, 2 theorems, 14 equations)

This paper contains 4 sections, 2 theorems, 14 equations.

Introduction
Problem Formulation and Main Result
Main Argument
Optimizing the Choice of Subsequence

Key Result

Theorem 1

In the setting of Subsection subsec:setup, we have the Rademacher complexity bound

Theorems & Definitions (4)

Theorem 1
Theorem 2
proof
proof : Proof of Theorem \ref{['thm:main']}

On Size-Independent Sample Complexity of ReLU Networks

TL;DR

Abstract

On Size-Independent Sample Complexity of ReLU Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (4)