Table of Contents
Fetching ...

On the Rashomon ratio of infinite hypothesis sets

Evzenie Coupkova, Mireille Boutin

TL;DR

It is shown that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and guarantees that such an estimation is close to the true value of the Rashomon ratio.

Abstract

Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.

On the Rashomon ratio of infinite hypothesis sets

TL;DR

It is shown that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and guarantees that such an estimation is close to the true value of the Rashomon ratio.

Abstract

Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.
Paper Structure (15 sections, 13 theorems, 87 equations, 5 figures)

This paper contains 15 sections, 13 theorems, 87 equations, 5 figures.

Key Result

Lemma 1

For all $\gamma>0$ and $\varepsilon>0$ we have

Figures (5)

  • Figure 1: Rashomon set for one-dimensional classification of a mixture of Gaussians by affine functions. Points on the circle correspond to functions in $\mathcal{F}_{\text{af}}$: blue diamonds are functions that belong to the Rashomon set with $\gamma=0.05$, red circles represent functions that do not. Each figure contains $1000$ samples from $\mathcal{F}_{\text{af}}$, the subfigures differ in the distance between the means ($2\mu$), while the other parameters are fixed: $d=1$, $\sigma=1$.
  • Figure 2: Rashomon ratio as a function of the distance between the means in the Gaussian mixture. The parameters are set to ${d}={1}$, $\sigma=1$ and $\gamma=0.05$. We estimate the Rashomon ratio using $1000$ functions from $\mathcal{F}_{af}$ randomly generated following a Uniform distribution on the circle as in Figure \ref{['one_dim_circle']}. The proportion of functions whose value according to \ref{['zero_com']} is smaller than $\gamma$ corresponds to the value of the Rashomon ratio in the plots. The minimum of the function on the left is when the distance between the means is $2\mu = 2$, where Rashomon ratio is approximately $0.355$. The minimum of the function on the right is at $2\mu = 1.97$, where Rashomon ratio is approximately $0.339$. According to Lemma \ref{['ratio_hoeffding']} these approximations of Rashomon ratio are within an error of $\varepsilon = 0.05$ to the true Rashomon ratio with probability at least $98\%$ since $N=1000$.
  • Figure 3: Rashomon ratio as a function of the distance between the means in the Gaussian mixture. The parameters are set to $d=2$, $\sigma=1$ and $\gamma=0.05$. Rashomon ratio is estimated as a proportion of $1000$ functions generated randomly according to a Uniform distribution on a sphere which have an reducible error from (\ref{['excess_error_multi']}) smaller than $\gamma$. The minimum of the function on the left is when the distance between the means is $2\|\boldsymbol{\mu}\| = 2$, where Rashomon ratio is approximately $0.181$. The minimum of the function on the right is at $2\|\boldsymbol{\mu}\| = 2.33$, where the Rashomon ratio is approximately $0.157$. According to Lemma \ref{['ratio_hoeffding']} these Rashomon ratio estimates are within an error of $\varepsilon = 0.05$ to the true Rashomon ratio with probability at least $98\%$ since $N=1000$.
  • Figure 4: Rashomon ratio as a function of the distance between the means in the Gaussian mixture. The parameters are set to ${d}={10}$, $\sigma=1$ and $\gamma=0.05$. Rashomon ratio is estimated as a proportion of $1000$ functions generated randomly according to a Uniform distribution on a sphere which have an reducible error from (\ref{['excess_error_multi']}) smaller than $\gamma$. The minimum of the function on the left is when the distance between the means is $2\|\boldsymbol{\mu}\| = 3$, where Rashomon ratio is approximately $0.001$. The minimum of the function on the right is at $2\|\boldsymbol{\mu}\| = 1.43$, where Rashomon ratio is approximately $0$ (according to Theorem \ref{['properties_of_the_rashomon_ratio_multidim']} Rashomon ratio is never $0$, but since we are making the approximation based on $1000$ functions only we are making a certain error - according to Lemma \ref{['ratio_hoeffding']} there is an error less or equal to $\varepsilon = 0.05$ with probability at least $98\%$ since $N=1000$.
  • Figure 5: Lower bound for the Rashomon ratio that approximates the formula (\ref{['lower_bound_formula']}). Parameter $\kappa$ that corresponds to the magnitude of the noise at initialization varies between $2$ and $10$. Parameter $\varepsilon$ is equal to $7.13$ - that is the value of the dominant term in formula (\ref{['epsilon_formula']}) for the Iris dataset (where only two first classes are taken into consideration). The dimensionality of data in this dataset is $d=4$ and we assume that the neural network used has $m=4$ nodes in its hidden layer. Probability of failure is set to $\delta=0.1$. We consider three different values for the parameter $\gamma$: blue line with dots corresponds to $\gamma=0.10$, orange line with triangles corresponds to $\gamma=0.11$ and green line with pentagons corresponds to $\gamma=0.12$.

Theorems & Definitions (30)

  • Definition 1: true Rashomon set
  • Definition 2: true Rashomon ratio
  • Definition 3: empirical Rashomon set
  • Definition 4: empirical Rashomon ratio
  • Definition 5: anchored versions
  • Lemma 1
  • proof
  • Lemma 2
  • Theorem 1
  • proof
  • ...and 20 more