Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Qiaozhe Zhang, Jun Sun, Ruijie Zhang, Yingzhuang Liu
TL;DR
This work introduces Rényi sharpness, defined as the negative Rényi entropy $-H_{oldsymbol{}}( extbf{H})$ of the normalized Hessian spectrum, to capture spectrum spread and its relation to generalization. It establishes reparameterization-invariant bounds that connect population risk to Rényi sharpness, and provides a practical estimator using stochastic Lanczos quadrature for large Hessians. The authors demonstrate a strong Kendall correlation between Rényi sharpness and generalization across multiple architectures and datasets, outperforming traditional sharpness metrics. They further propose RSAM, a computationally efficient regularizer that encourages lower Rényi sharpness during training and achieves improvements over SAM-based methods on several benchmarks. Overall, the paper combines theory and applied methodology to link Hessian spectral structure with generalization, offering a scalable path to improved training through Rényi sharpness regularization.
Abstract
Sharpness (of the loss minima) is a common measure to investigate the generalization of neural networks. Intuitively speaking, the flatter the landscape near the minima is, the better generalization might be. Unfortunately, the correlation between many existing sharpness measures and the generalization is usually not strong, sometimes even weak. To close the gap between the intuition and the reality, we propose a novel sharpness measure, i.e., \textit{Rényi sharpness}, which is defined as the negative Rényi entropy (a generalization of the classical Shannon entropy) of the loss Hessian. The main ideas are as follows: 1) we realize that \textit{uniform} (identical) eigenvalues of the loss Hessian is most desirable (while keeping the sum constant) to achieve good generalization; 2) we employ the \textit{Rényi entropy} to concisely characterize the extent of the spread of the eigenvalues of loss Hessian. Normally, the larger the spread, the smaller the (Rényi) entropy. To rigorously establish the relationship between generalization and (Rényi) sharpness, we provide several generalization bounds in terms of Rényi sharpness, by taking advantage of the reparametrization invariance property of Rényi sharpness, as well as the trick of translating the data discrepancy to the weight perturbation. Furthermore, extensive experiments are conducted to verify the strong correlation (in specific, Kendall rank correlation) between the Rényi sharpness and generalization. Moreover, we propose to use a variant of Rényi Sharpness as regularizer during training, i.e., Rényi Sharpness Aware Minimization (RSAM), which turns out to outperform all existing sharpness-aware minimization methods. It is worthy noting that the test accuracy gain of our proposed RSAM method could be as high as nearly 2.5\%, compared against the classical SAM method.
