On Size-Independent Sample Complexity of ReLU Networks
Mark Sellke
TL;DR
The paper addresses how to bound the sample complexity of learning $D$-layer ReLU networks under norm constraints via Rademacher complexity. It refines prior depth-dependent bounds by employing a subsequence-based contraction argument to achieve bounds that are often independent of network depth, formalized through ${\mathcal{R}}_n({\mathcal F}_D) \le 15 B n^{-1/2} P_F(D) \sqrt{\sum_{d=0}^{D-1} R(d)}$. The bound depends on per-layer Frobenius and operator norms via $P_F(D)$ and $R(d)=P_{op}(d)/P_F(d)$, and can be sharpened by selecting an optimal subsequence $0=d_0<d_1<\dots<d_k=D$ to yield ${\mathcal{R}}_n({\mathcal F}_D) \le 5 B n^{-1/2} P_F(D) \sum_{i=1}^k R(d_{i-1}) \sqrt{d_i-d_{i-1}}$. In typical scenarios where $R(d)$ decays rapidly (often exponentially), this leads to near depth-independence without restricting intermediate widths, offering a refined perspective on the generalization capabilities of ReLU networks under norm constraints.
Abstract
We study the sample complexity of learning ReLU neural networks from the point of view of generalization. Given norm constraints on the weight matrices, a common approach is to estimate the Rademacher complexity of the associated function class. Previously Golowich-Rakhlin-Shamir (2020) obtained a bound independent of the network size (scaling with a product of Frobenius norms) except for a factor of the square-root depth. We give a refinement which often has no explicit depth-dependence at all.
