On the Rashomon ratio of infinite hypothesis sets

Evzenie Coupkova; Mireille Boutin

On the Rashomon ratio of infinite hypothesis sets

Evzenie Coupkova, Mireille Boutin

TL;DR

It is shown that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and guarantees that such an estimation is close to the true value of the Rashomon ratio.

Abstract

Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.

On the Rashomon ratio of infinite hypothesis sets

TL;DR

Abstract

Paper Structure (15 sections, 13 theorems, 87 equations, 5 figures)

This paper contains 15 sections, 13 theorems, 87 equations, 5 figures.

Introduction
Related work
Preliminaries
Notation
Rashomon set, Rashomon ratio
Numerical estimation of Rashomon ratio - guarantees
Rashomon ratio for affine classifiers applied to a mixture of two Gaussians
One-dimensional case
Higher-dimensional case
Rashomon ratio of a two-layer neural network applied to dataset with positive-definite Gram matrix
Rashomon set contains an $\varepsilon$-net for the hypothesis family
Lower bound for the Rashomon ratio
Advantage of a large Rashomon ratio for infinite classifier families
Application of the Theorems to the experimental results
Conclusion

Key Result

Lemma 1

For all $\gamma>0$ and $\varepsilon>0$ we have

Figures (5)

Figure 1: Rashomon set for one-dimensional classification of a mixture of Gaussians by affine functions. Points on the circle correspond to functions in $\mathcal{F}_{\text{af}}$: blue diamonds are functions that belong to the Rashomon set with $\gamma=0.05$, red circles represent functions that do not. Each figure contains $1000$ samples from $\mathcal{F}_{\text{af}}$, the subfigures differ in the distance between the means ($2\mu$), while the other parameters are fixed: $d=1$, $\sigma=1$.
Figure 2: Rashomon ratio as a function of the distance between the means in the Gaussian mixture. The parameters are set to ${d}={1}$, $\sigma=1$ and $\gamma=0.05$. We estimate the Rashomon ratio using $1000$ functions from $\mathcal{F}_{af}$ randomly generated following a Uniform distribution on the circle as in Figure \ref{['one_dim_circle']}. The proportion of functions whose value according to \ref{['zero_com']} is smaller than $\gamma$ corresponds to the value of the Rashomon ratio in the plots. The minimum of the function on the left is when the distance between the means is $2\mu = 2$, where Rashomon ratio is approximately $0.355$. The minimum of the function on the right is at $2\mu = 1.97$, where Rashomon ratio is approximately $0.339$. According to Lemma \ref{['ratio_hoeffding']} these approximations of Rashomon ratio are within an error of $\varepsilon = 0.05$ to the true Rashomon ratio with probability at least $98\%$ since $N=1000$.
Figure 3: Rashomon ratio as a function of the distance between the means in the Gaussian mixture. The parameters are set to $d=2$, $\sigma=1$ and $\gamma=0.05$. Rashomon ratio is estimated as a proportion of $1000$ functions generated randomly according to a Uniform distribution on a sphere which have an reducible error from (\ref{['excess_error_multi']}) smaller than $\gamma$. The minimum of the function on the left is when the distance between the means is $2\|\boldsymbol{\mu}\| = 2$, where Rashomon ratio is approximately $0.181$. The minimum of the function on the right is at $2\|\boldsymbol{\mu}\| = 2.33$, where the Rashomon ratio is approximately $0.157$. According to Lemma \ref{['ratio_hoeffding']} these Rashomon ratio estimates are within an error of $\varepsilon = 0.05$ to the true Rashomon ratio with probability at least $98\%$ since $N=1000$.
Figure 4: Rashomon ratio as a function of the distance between the means in the Gaussian mixture. The parameters are set to ${d}={10}$, $\sigma=1$ and $\gamma=0.05$. Rashomon ratio is estimated as a proportion of $1000$ functions generated randomly according to a Uniform distribution on a sphere which have an reducible error from (\ref{['excess_error_multi']}) smaller than $\gamma$. The minimum of the function on the left is when the distance between the means is $2\|\boldsymbol{\mu}\| = 3$, where Rashomon ratio is approximately $0.001$. The minimum of the function on the right is at $2\|\boldsymbol{\mu}\| = 1.43$, where Rashomon ratio is approximately $0$ (according to Theorem \ref{['properties_of_the_rashomon_ratio_multidim']} Rashomon ratio is never $0$, but since we are making the approximation based on $1000$ functions only we are making a certain error - according to Lemma \ref{['ratio_hoeffding']} there is an error less or equal to $\varepsilon = 0.05$ with probability at least $98\%$ since $N=1000$.
Figure 5: Lower bound for the Rashomon ratio that approximates the formula (\ref{['lower_bound_formula']}). Parameter $\kappa$ that corresponds to the magnitude of the noise at initialization varies between $2$ and $10$. Parameter $\varepsilon$ is equal to $7.13$ - that is the value of the dominant term in formula (\ref{['epsilon_formula']}) for the Iris dataset (where only two first classes are taken into consideration). The dimensionality of data in this dataset is $d=4$ and we assume that the neural network used has $m=4$ nodes in its hidden layer. Probability of failure is set to $\delta=0.1$. We consider three different values for the parameter $\gamma$: blue line with dots corresponds to $\gamma=0.10$, orange line with triangles corresponds to $\gamma=0.11$ and green line with pentagons corresponds to $\gamma=0.12$.

Theorems & Definitions (30)

Definition 1: true Rashomon set
Definition 2: true Rashomon ratio
Definition 3: empirical Rashomon set
Definition 4: empirical Rashomon ratio
Definition 5: anchored versions
Lemma 1
proof
Lemma 2
Theorem 1
proof
...and 20 more

On the Rashomon ratio of infinite hypothesis sets

TL;DR

Abstract

On the Rashomon ratio of infinite hypothesis sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (30)