On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond
Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao
TL;DR
This paper derives a margin-based, data-dependent generalization bound for deep neural networks by leveraging a Jacobian-based Lipschitz analysis, yielding a bound that scales with the Jacobian norm rather than the product of weight norms. The authors present a tight bound for generic DNNs and specialized, architecture-aware versions for CNNs and ResNets, including cases with bounded losses and width-change operations. They show that, under orthogonal CNN filters and structured ResNets, the resulting ERC bounds depend on kernel-level quantities or skip-connection norms rather than full layer widths, enabling tighter generalization estimates for practical networks. Numerical experiments on CIFAR-10 corroborate the theory, showing the proposed bounds are significantly tighter than existing norm-based bounds and that larger parameter spaces can still generalize well. Overall, the work provides a principled, Jacobian-centric framework to bound generalization in deep models and offers concrete, structure-aware refinements for popular architectures.
Abstract
We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks. Through introducing a new characterization of the Lipschitz properties of neural network family, we achieve significantly tighter generalization bounds than existing results. Moreover, we show that the generalization bound can be further improved for bounded losses. Aside from the general feedforward deep neural networks, our results can be applied to derive new bounds for popular architectures, including convolutional neural networks (CNNs) and residual networks (ResNets). When achieving same generalization errors with previous arts, our bounds allow for the choice of larger parameter spaces of weight matrices, inducing potentially stronger expressive ability for neural networks. Numerical evaluation is also provided to support our theory.
