A path-norm toolkit for modern networks: consequences, promises and challenges
Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, Rémi Gribonval
TL;DR
This paper develops a versatile path-norm toolkit for general DAG ReLU networks that include biases, skip connections, and pooling, addressing the limitations of previous path-norm definitions. It introduces a generalized path-lifting $\Phi^G(\bm\theta)$ and path-activations $\mathbf{A}^G(\bm\theta,x)$, defining $L^q$ path-norms and mixed path-norms that yield end-to-end Lipschitz bounds and tighter comparisons to products of operator norms. A new generalization bound is derived for cross-entropy loss on arbitrary DAG ReLU architectures, incorporating depth, pooling variety, and output dimensions; contraction lemmas and a peeling argument underpin the bound, which can be tightened via margin-based analyses for top-1 accuracy. Empirical results on ImageNet with ResNets reveal a gap between theory and practice for dense models, while sparsity can substantially reduce the bound, suggesting practical avenues to close the gap. Overall, the work provides the first comprehensive framework for path-norm based generalization on modern networks and highlights concrete directions to bring theory closer to observed performance in real-world settings.
Abstract
This work introduces the first toolkit around path-norms that fully encompasses general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on layered fully-connected networks compared to the product of operator norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.
