Hardness of Learning Neural Networks under the Manifold Hypothesis

Bobak T. Kiani; Jason Wang; Melanie Weber

Hardness of Learning Neural Networks under the Manifold Hypothesis

Bobak T. Kiani, Jason Wang, Melanie Weber

TL;DR

This work analyzes the impact of the manifold geometry on the learnability of neural networks. It shows that data lying on bounded-curvature manifolds can be provably hard to learn in the statistical-query framework, while introducing volume-based assumptions yields a simple interpolation-based learnability proof for efficiently sampleable manifolds, including those reconstructible by manifold learning. The authors provide cryptographic hardness results to reinforce the hardness in difficult geometric regimes and complement theory with experiments that validate learnability in the favorable regime and hardness in the challenging one, along with an empirical study of intrinsic dimensionality in real image datasets. Together, these results delineate when geometry helps or hinders learnability and suggest that real-world data likely reside in intermediate, heterogeneous geometric regimes that demand more nuanced algorithms or architectures.

Abstract

The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.

Hardness of Learning Neural Networks under the Manifold Hypothesis

TL;DR

Abstract

Paper Structure (29 sections, 12 theorems, 21 equations, 6 figures, 1 table)

This paper contains 29 sections, 12 theorems, 21 equations, 6 figures, 1 table.

Introduction
Background
Basic Notation
Learning Setting
Manifold smoothness restrictions.
Related works
Learnability results
Sampleable regime
Hard regime with bounded curvature
Experiments
Empirical verification of main findings
Empirical study of geometry of data manifolds
Experimental Setup.
Results.
Discussion
...and 14 more sections

Key Result

Proposition 3.4

Let $n$-dimensional inputs be drawn from a sequence of efficiently sampleable manifolds $\mathcal{M}_n$ with intrinsic dimension $d=O(1)$ and distributions $\mathcal{D}_{\mathcal{M}_n}$ over the manifold. Denote $\mathcal{H}_n$ as the function class of constant depth ReLU networks on $n$ inputs with

Figures (6)

Figure 1: Example of a one-dimensional manifold and its medial axis. Its reach is given by the minimum distance of the medial axis to the manifold.
Figure 2: Learnability of neural networks depends on the regularity and smoothness properties of the input data manifold. In the efficiently sampleable regime corresponding to manifolds which can be approximated well with samples, neural networks are learnable via simple interpolation arguments. In the regime where manifolds are bounded solely by their curvature and intrinsic dimension, we show classes of manifolds that obstruct the learnability of algorithms. Real-world data likely lives in an intermediate regime with heterogeneous properties (e.g. manifolds with varying intrinsic dimension; see \ref{['sec:experimental_geometry']}).
Figure 3: (a) Learning is successful when inputs are drawn from a $d=10$ intrinsic dimensional hypersphere living in ambient space of dimension $n$ -- an instance of the bounded positive curvature model in \ref{['ex:isoperimetric_setting']}. Target functions are single hidden layer networks taken from the class of hard to learn functions in the Gaussian i.i.d. input model diakonikolas2020algorithms, which are no longer hard to learn in the input distribution considered here. (b) When the ambient dimension is large, learning algorithm struggles to learn a single hidden layer neural network drawn from the class of functions in the setting of \ref{['thm:hard_to_learn']} where the input data manifold has intrinsic dimension $d=1$ and reach $R=0.5$. The network trained to learn this target function is over-parameterized with respect to the target. Data is aggregated over five random realizations.
Figure 4: Sample spectrum of the singular values of a collection of score vectors given by a trained diffusion model pointing towards the direction of de-noising an image.
Figure 5: Enumeration of Gray code for $n=7$ bits. Each column corresponds to a bitstring where black square is equal to $0$ and tan square is $1$. Note that the variation takes place largely in the last entries (bottom-most).
...and 1 more figures

Theorems & Definitions (34)

Definition 2.1: Efficiently PAC Learnable
Definition 2.2: Reach from medial axis
Definition 3.1: $(\epsilon,\delta)$-net
Definition 3.2: Efficiently sampleable manifold
Remark 3.3
Proposition 3.4
proof
Proposition 3.5: Bounded Ricci curvature (Isoperimetric setting, see gromov1986isoperimetricbubeck2023universal for motivation)
proof
Theorem 3.6
...and 24 more

Hardness of Learning Neural Networks under the Manifold Hypothesis

TL;DR

Abstract

Hardness of Learning Neural Networks under the Manifold Hypothesis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (34)