Table of Contents
Fetching ...

On the Limitations of Fractal Dimension as a Measure of Generalization

Charlie B. Tan, Inés García-Redondo, Qiquan Wang, Michael M. Bronstein, Anthea Monod

TL;DR

This study reveals confounding effects in the observed correlation between generalization and topological measures due to the variation of hyperparameters, and reveals the intriguing manifestation of model-wise double descent in these topological generalization measures.

Abstract

Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. There is a recent and growing body of literature that proposes the framework of fractals to model optimization trajectories of neural networks, motivating generalization bounds and measures based on the fractal dimension of the trajectory. Notably, the persistent homology dimension has been proposed to correlate with the generalization gap. This paper performs an empirical evaluation of these persistent homology-based generalization measures, with an in-depth statistical analysis. Our study reveals confounding effects in the observed correlation between generalization and topological measures due to the variation of hyperparameters. We also observe that fractal dimension fails to predict generalization of models trained from poor initializations. We lastly reveal the intriguing manifestation of model-wise double descent in these topological generalization measures. Our work forms a basis for a deeper investigation of the causal relationships between fractal geometry, topological data analysis, and neural network optimization.

On the Limitations of Fractal Dimension as a Measure of Generalization

TL;DR

This study reveals confounding effects in the observed correlation between generalization and topological measures due to the variation of hyperparameters, and reveals the intriguing manifestation of model-wise double descent in these topological generalization measures.

Abstract

Bounding and predicting the generalization gap of overparameterized neural networks remains a central open problem in theoretical machine learning. There is a recent and growing body of literature that proposes the framework of fractals to model optimization trajectories of neural networks, motivating generalization bounds and measures based on the fractal dimension of the trajectory. Notably, the persistent homology dimension has been proposed to correlate with the generalization gap. This paper performs an empirical evaluation of these persistent homology-based generalization measures, with an in-depth statistical analysis. Our study reveals confounding effects in the observed correlation between generalization and topological measures due to the variation of hyperparameters. We also observe that fractal dimension fails to predict generalization of models trained from poor initializations. We lastly reveal the intriguing manifestation of model-wise double descent in these topological generalization measures. Our work forms a basis for a deeper investigation of the causal relationships between fractal geometry, topological data analysis, and neural network optimization.
Paper Structure (27 sections, 1 theorem, 23 equations, 7 figures, 4 tables)

This paper contains 27 sections, 1 theorem, 23 equations, 7 figures, 4 tables.

Key Result

Theorem B.1

Let $\mu$ be a $d$-Ahlfors regular measure on a metric space and $\mathbf{x} = \{x_1, \, \dots, \, x_n\}$ i.i.d. samples from $\mu$. If $0 < \alpha < d$, then with high probability as $n \to \infty$, where $\approx$ means that the ratio of the two quantities is bounded between positive constants that do not depend on $n$.

Figures (7)

  • Figure 1: Adversarial initialization is a failure mode for PH dimension-based generalization measures. Training models from an adversarial initialization leads to higher accuracy gap than for models trained from random initialization. Both PH dimensions fail to correctly attribute higher values to the poorly generalizing models on FCN-5 MNIST and CNN CIFAR-10.
  • Figure 2: Learning rate/batch size grid results. Euclidean (top) and loss-based (bottom) PH dimension plotted against generalization gap for range of learning rates and batch sizes.
  • Figure 3: Diagram of causal relationships under investigation in the conditional independence test. In $H_0$ the PH dimension is conditionally independent of PH dimension given learning rate and there is no direct causal relationship between these variables. In $H_1$ generalization gap is conditionally dependent of the PH dimension indicating a causal relationship may exist.
  • Figure 4: Model-wise double descent manifests in Euclidean PH dimension, whilst neither PH dimension correlates with generalization gap in this setting. Test accuracy, generalization gap, and PH dimensions for range of CNN widths. The double descent behavior is clearly visible in test accuracy and Euclidean PH dimension, but the generalization gap is monotonic in this critical region. Mean of three seeds with standard deviation shaded.
  • Figure 5: Vietoris--Rips filtration over two noisy circles (with 30 and 15 points each) at 4 different filtration values; and corresponding persistence barcode and diagram (0-dimensional PH in red, 1-dimensional PH in blue). Images produced using GUDHI gudhi:urm.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Definition 2.1: schweinhart_fractal_2020
  • Definition B.1
  • Definition B.2
  • Definition B.3
  • Definition B.4
  • Definition B.5
  • Definition B.6
  • Remark 1
  • Definition B.7
  • Definition B.8: schweinhart_fractal_2020
  • ...and 2 more