Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

Hiroki Naganuma; Taiji Suzuki; Rio Yokota; Masahiro Nomura; Kohta Ishikawa; Ikuro Sato

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato

TL;DR

This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs and demonstrates that TIC provides better trial pruning ability than existing methods for hyperparameter optimization.

Abstract

Generalization measures have been studied extensively in the machine learning community to better characterize generalization gaps. However, establishing a reliable generalization measure for statistically singular models such as deep neural networks (DNNs) is difficult due to their complex nature. This study focuses on Takeuchi's information criterion (TIC) to investigate the conditions under which this classical measure can effectively explain the generalization gaps of DNNs. Importantly, the developed theory indicates the applicability of TIC near the neural tangent kernel (NTK) regime. In a series of experiments, we trained more than 5,000 DNN models with 12 architectures, including large models (e.g., VGG-16), on four datasets, and estimated the corresponding TIC values to examine the relationship between the generalization gap and the TIC estimates. We applied several TIC approximation methods with feasible computational costs and assessed the accuracy trade-off. Our experimental results indicate that the estimated TIC values correlate well with the generalization gap under conditions close to the NTK regime. However, we show both theoretically and empirically that outside the NTK regime such correlation disappears. Finally, we demonstrate that TIC provides better trial pruning ability than existing methods for hyperparameter optimization.

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

TL;DR

Abstract

Paper Structure (44 sections, 2 theorems, 18 equations, 32 figures, 9 tables)

This paper contains 44 sections, 2 theorems, 18 equations, 32 figures, 9 tables.

Introduction
Generalization Measures
Which Generalization Measure is Promising?
Information Matrix: Elements of Generalization Measures
Deriving TIC as the Generalization Gap in the NTK Regime
Approximation of TIC
Hessian, Generalized Gauss-Newton Matrix (GGN) and FIM
Approximation of Matrices and Trace Estimation
Experiments
Overview
Small-Scale Experiments: Comparing Approximation and Exact Results
Practical Scale Experiments: Correlation to the Generalization Gap and TIC Lower Bound, TIC with Diagonal Approximation
Runtime Measurement Experiments
Application to Hyperparameter Optimization
Recent Related Work
...and 29 more sections

Key Result

Proposition 2.1

Under assumptions (A1) and (A2), the bias $b$ of the empirical loss as an estimator of the expected loss is given by Here $\boldsymbol{H}_p(\boldsymbol{\theta}^{*})$ and $\boldsymbol{C}_p(\boldsymbol{\theta}^{*})$ are the Hessian and covariance, respectively, evaluated at $\boldsymbol{\theta}^{*}$ under the true data distribution $p$. Since the true data distribution $p$ and the parameter $\bolds

Figures (32)

Figure 1: : Tr(H) vs Tr(F)
Figure 2: : Exact vs block-diagonal
Figure 3: : Exact vs diagonal
Figure 4: : Exact vs lower bound
Figure 6: : TinyMNIST
...and 27 more figures

Theorems & Definitions (9)

Remark 2.1
Proposition 2.1: Generalization Gap in NTK Regime is Equal to TIC
Remark 2.2
Remark 2.3
Remark 2.4
Proposition 3.1: $\boldsymbol{H}(\boldsymbol{\theta})$ is equal to $\boldsymbol{F}(\boldsymbol{\theta})$ through the GGN
Remark 4.1
Remark 4.2
Remark 4.3

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

TL;DR

Abstract

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (32)

Theorems & Definitions (9)