Table of Contents
Fetching ...

Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis

Nikita Gabdullin

TL;DR

The paper investigates neural network generalization through loss landscape visualization and Hessian analysis, highlighting that conventional plotting can fail with batch normalization. It introduces Loss Landscape Analysis (LLA) and advocates Hessian-axis plots alongside spectral-density criteria to quantify curvature and relate it to generalization. The KH05 criterion, derived from Hessian spectra, emerges as a practical proxy that tracks changes in generalization performance across architectures and dataset shifts, particularly for large-scale datasets. The study demonstrates that these methods offer a computationally efficient means to estimate generalization, with implications for model selection and robustness in real-world settings.

Abstract

This paper studies generalization capabilities of neural networks (NNs) using new and improved PyTorch library Loss Landscape Analysis (LLA). LLA facilitates visualization and analysis of loss landscapes along with the properties of NN Hessian. Different approaches to NN loss landscape plotting are discussed with particular focus on normalization techniques showing that conventional methods cannot always ensure correct visualization when batch normalization layers are present in NN architecture. The use of Hessian axes is shown to be able to mitigate this effect, and methods for choosing Hessian axes are proposed. In addition, spectra of Hessian eigendecomposition are studied and it is shown that typical spectra exist for a wide range of NNs. This allows to propose quantitative criteria for Hessian analysis that can be applied to evaluate NN performance and assess its generalization capabilities. Generalization experiments are conducted using ImageNet-1K pre-trained models along with several models trained as part of this study. The experiment include training models on one dataset and testing on another one to maximize experiment similarity to model performance in the Wild. It is shown that when datasets change, the changes in criteria correlate with the changes in accuracy, making the proposed criteria a computationally efficient estimate of generalization ability, which is especially useful for extremely large datasets.

Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis

TL;DR

The paper investigates neural network generalization through loss landscape visualization and Hessian analysis, highlighting that conventional plotting can fail with batch normalization. It introduces Loss Landscape Analysis (LLA) and advocates Hessian-axis plots alongside spectral-density criteria to quantify curvature and relate it to generalization. The KH05 criterion, derived from Hessian spectra, emerges as a practical proxy that tracks changes in generalization performance across architectures and dataset shifts, particularly for large-scale datasets. The study demonstrates that these methods offer a computationally efficient means to estimate generalization, with implications for model selection and robustness in real-world settings.

Abstract

This paper studies generalization capabilities of neural networks (NNs) using new and improved PyTorch library Loss Landscape Analysis (LLA). LLA facilitates visualization and analysis of loss landscapes along with the properties of NN Hessian. Different approaches to NN loss landscape plotting are discussed with particular focus on normalization techniques showing that conventional methods cannot always ensure correct visualization when batch normalization layers are present in NN architecture. The use of Hessian axes is shown to be able to mitigate this effect, and methods for choosing Hessian axes are proposed. In addition, spectra of Hessian eigendecomposition are studied and it is shown that typical spectra exist for a wide range of NNs. This allows to propose quantitative criteria for Hessian analysis that can be applied to evaluate NN performance and assess its generalization capabilities. Generalization experiments are conducted using ImageNet-1K pre-trained models along with several models trained as part of this study. The experiment include training models on one dataset and testing on another one to maximize experiment similarity to model performance in the Wild. It is shown that when datasets change, the changes in criteria correlate with the changes in accuracy, making the proposed criteria a computationally efficient estimate of generalization ability, which is especially useful for extremely large datasets.

Paper Structure

This paper contains 19 sections, 3 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: Loss landscapes of (left) ResNet18, and (right) VIT-small plotted along random axes.
  • Figure 2: Loss landscapes for ResNet18 in different regimes: (left) value explosion in eval mode, (center) normal behavior with random axes in train mode, (right) normal behavior with hessian axes in train mode.
  • Figure 3: (Left) MobileNet and (right) VIT loss landscapes with filter L2 normalization which exhibit no "value explosion".
  • Figure 4: Filter-normalized loss landscapes of models that exhibit value explosion, typical for SqueezeNet, AlexNet, and LeNet: (left) no loss cap and (right) loss caped at 100.
  • Figure 5: HESD plots of untrained neural networks with randomly initialized weights: (left) AlexNet, (center) SqueezeNet 1.1, and (right) MobileNetV2.
  • ...and 14 more figures