Towards Exact Computation of Inductive Bias

Akhilan Boopathy; William Yue; Jaedong Hwang; Abhiram Iyer; Ila Fiete

Towards Exact Computation of Inductive Bias

Akhilan Boopathy, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete

TL;DR

The proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

Abstract

Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

Towards Exact Computation of Inductive Bias

TL;DR

Abstract

Paper Structure (26 sections, 1 theorem, 51 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 1 theorem, 51 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Generalization vs. Sample Complexity
Generalization vs. Inductive Bias Complexity
Quantifying Inductive Bias
Definition of Inductive Bias
Efficiently Sampling from the Hypothesis Space
Direct Optimization by Gradient Descent
Kernel-based Sampling
Modeling the Test Error Distribution
Bounding the Approximation Error
Experimental Results
Inductive Bias Required to Generalize on Different Tasks
Inductive Bias in Different Models
Discussion
...and 11 more sections

Key Result

Theorem 2

Suppose we are provided a hypothesis distribution $p_h$, input distribution $p_x$, loss function $L$ and desired error rate $\varepsilon$. Suppose we estimate $I(\varepsilon, p_h, p_x, L)$ by first sampling $n$ hypotheses ($h^1, h^2, ... h^n$) iid from $q_h$ which is close to $p_h$ in the sense that for all $h$. We then compute the test losses of each hypothesis $\mathbb{E}_{x \sim p_x}[L(h^1, x)]

Figures (5)

Figure 1: An illustration of example hypothesis spaces, model classes, and specific models for a particular learning problem. Red circles indicate training points and black curves indicate hypotheses. A hypothesis space sets the broad set of models we wish to consider. In this illustration, we consider the hypothesis space of all functions and a smaller hypothesis space of band-limited functions (i.e. functions with limited maximum frequency). A model class is a set of models associated with a particular set of inductive biases. We measure the required amount of inductive bias to solve a task based on the size of the well-generalizing region within the context of a particular hypothesis space.
Figure 2: Illustration of how the required inductive bias for a task can be computed from the hypothesis space and the region of well-generalizing hypotheses. Black boxes indicate hypothesis spaces; $p_h$ is a uniform distribution over each box. Purple indicates regions of well-generalizing hypotheses. Inductive bias is the negative log of the fraction of hypothesis space that generalizes well: $I = -\log \frac{Hypothesis\,space \cap Well-generalizing\,hypotheses}{Hypothesis\,space}$. It depends on both the size of the hypothesis space as well as how much the hypothesis space overlaps with well-generalizing hypotheses. Different hypothesis spaces may yield different inductive bias estimates even on the same task (i.e. the same set of well-generalizing hypotheses).
Figure 3: Fitting a scaled non-central Chi-squared distribution to an empirical distribution of mean squared errors of models drawn from a kernel-based Gaussian RBF hypothesis space on a restricted version of MNIST. Observe that the distribution closely models the empirical distribution.
Figure 4: Distribution of hypothesis losses for MNIST after 5, 10, and 15 epochs of gradient descent. Notice that the update in distribution is minimal.
Figure 5: Fitted scaled non-central Chi-squared distributions for the test set errors on MNIST, CIFAR-10, Omniglot, and Inverted Pendulum tasks under a Gaussian RBF kernel hypothesis space.

Theorems & Definitions (3)

Definition 1
Theorem 2
proof

Towards Exact Computation of Inductive Bias

TL;DR

Abstract

Towards Exact Computation of Inductive Bias

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)