The effects of Hessian eigenvalue spectral density type on the applicability of Hessian analysis to generalization capability assessment of neural networks
Nikita Gabdullin
TL;DR
This work investigates how the Hessian eigenvalue spectral density (HESD) relates to neural network generalization, focusing on the prevalence of MP-HESD (mostly positive eigenvalues) during standard training and the emergence of MN-HESD (mostly negative eigenvalues) under external gradient manipulation. It introduces a practical HESD-type criterion, Ct, to classify HESD type and outlines a unified generalization assessment framework that remains reliable for MP-HESD but breaks down for MN-HESD. The study demonstrates that data augmentation and typical optimizers maintain MP-HESD and good generalization, while gradient manipulations (e.g., AdaHessian, clipping) can induce MN-HESD and undermine Hessian-based conclusions. It further proposes updated numerical conditions and an algorithm to assess generalization using HESD, emphasizing the need to verify HESD type before applying Hessian-based criteria and acknowledging quasi-singular HESD as a potential confounder. The findings highlight a practical pathway for Hessian-based generalization analysis in MP-HESD scenarios while warning against reliance on such analyses when MN-HESD is present, with implications for model selection and training protocols in real-world settings.
Abstract
Hessians of neural network (NN) contain essential information about the curvature of NN loss landscapes which can be used to estimate NN generalization capabilities. We have previously proposed generalization criteria that rely on the observation that Hessian eigenvalue spectral density (HESD) behaves similarly for a wide class of NNs. This paper further studies their applicability by investigating factors that can result in different types of HESD. We conduct a wide range of experiments showing that HESD mainly has positive eigenvalues (MP-HESD) for NN training and fine-tuning with various optimizers on different datasets with different preprocessing and augmentation procedures. We also show that mainly negative HESD (MN-HESD) is a consequence of external gradient manipulation, indicating that the previously proposed Hessian analysis methodology cannot be applied in such cases. We also propose criteria and corresponding conditions to determine HESD type and estimate NN generalization potential. These HESD types and previously proposed generalization criteria are combined into a unified HESD analysis methodology. Finally, we discuss how HESD changes during training, and show the occurrence of quasi-singular (QS) HESD and its influence on the proposed methodology and on the conventional assumptions about the relation between Hessian eigenvalues and NN loss landscape curvature.
