Table of Contents
Fetching ...

Fine-Grained Uncertainty Quantification via Collisions

Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon

Abstract

We propose a new and intuitive metric for aleatoric uncertainty quantification (UQ), the prevalence of class collisions defined as the same input being observed in different classes. We use the rate of class collisions to define the collision matrix, a novel and uniquely fine-grained measure of uncertainty. For a classification problem involving $K$ classes, the $K\times K$ collision matrix $S$ measures the inherent difficulty in distinguishing between each pair of classes. We discuss several applications of the collision matrix, establish its fundamental mathematical properties, and show its relationship with existing UQ methods, including the Bayes error rate (BER). We also address the new problem of estimating the collision matrix using one-hot labeled data by proposing a series of innovative techniques to estimate $S$. First, we learn a pair-wise contrastive model which accepts two inputs and determines if they belong to the same class. We then show that this contrastive model (which is PAC learnable) can be used to estimate the row Gramian matrix of $S$, defined as $G=SS^T$. Finally, we show that under reasonable assumptions, $G$ can be used to uniquely recover $S$, a new result on non-negative matrices which could be of independent interest. With a method to estimate $S$ established, we demonstrate how this estimate of $S$, in conjunction with the contrastive model, can be used to estimate the posterior class probability distribution of any point. Experimental results are also presented to validate our methods of estimating the collision matrix and class posterior distributions on several datasets.

Fine-Grained Uncertainty Quantification via Collisions

Abstract

We propose a new and intuitive metric for aleatoric uncertainty quantification (UQ), the prevalence of class collisions defined as the same input being observed in different classes. We use the rate of class collisions to define the collision matrix, a novel and uniquely fine-grained measure of uncertainty. For a classification problem involving classes, the collision matrix measures the inherent difficulty in distinguishing between each pair of classes. We discuss several applications of the collision matrix, establish its fundamental mathematical properties, and show its relationship with existing UQ methods, including the Bayes error rate (BER). We also address the new problem of estimating the collision matrix using one-hot labeled data by proposing a series of innovative techniques to estimate . First, we learn a pair-wise contrastive model which accepts two inputs and determines if they belong to the same class. We then show that this contrastive model (which is PAC learnable) can be used to estimate the row Gramian matrix of , defined as . Finally, we show that under reasonable assumptions, can be used to uniquely recover , a new result on non-negative matrices which could be of independent interest. With a method to estimate established, we demonstrate how this estimate of , in conjunction with the contrastive model, can be used to estimate the posterior class probability distribution of any point. Experimental results are also presented to validate our methods of estimating the collision matrix and class posterior distributions on several datasets.

Paper Structure

This paper contains 26 sections, 10 theorems, 63 equations, 12 figures, 3 algorithms.

Key Result

Proposition 1

The entries of the collision matrix $S$ are defined by $S$ is a row stochastic matrix. In the special case with uniform class priors, i.e., $\boldsymbol{\pi} = \left(\frac{1}{K},\frac{1}{K},...,\frac{1}{K}\right)$, $S$ is a symmetric matrix and, therefore, doubly stochastic.

Figures (12)

  • Figure 1: Part (A) shows the collision matrix ($S$) for a $K=3$ class classification setting where classes $1$ and $2$ are easily confused with each other, but class $3$ is not. This information is encapsulated in the collision matrix by the values in position $(1,2)$ and $(2,1)$ being larger than the other off-diagonal elements, but this information is not found in the BER. Part (B) compares the collision matrix against the uncertainty quantification (UQ) measures BER BER_ensemble, cross-validation accuracy, Bayesian Networks BNN_tutorial and MC Dropout mc_dropout.
  • Figure 2: Figure demonstrating how the collision matrix $(S)$ and the class prior probability vector $\boldsymbol{\pi}$ can be used to calculate class precision and recall values. See Section \ref{['sec:BER']} for further discussion.
  • Figure 3: A comparison of $\mathcal{D}_{\text{collision}}$ (Collision Divergence) to other common statistical divergences. We compare the divergence between $2$ normal distributions with variance $\sigma^2=1$ means $\mu$ and $-\mu$.
  • Figure 4: Outline of our pair-wise contrastive method for estimating class posterior probability distributions $\mathbf{y}(\mathbf{x})$ from one-hot training data: Starting from one-hot data $\mathcal{T}$, we create difference training data $\mathcal{T}_{\text{diff}}$ defined in \ref{['eqn:diff_data']} consisting of pairs of inputs from either the same or different classes. We next use $\mathcal{T}_{\text{diff}}$ to train a pair-wise contrastive model $V$ from Section \ref{['sec:pair-wise_contrast']}. The model $V$ and $\mathcal{T}$ are used to estimate the matrix $S$ (as described in Section \ref{['sec:collision_algorithm']}) and expected similarity scores $\mathbf{q}(\mathbf{x})$ according to \ref{['eqn:approx_simvec']}. Finally we solve $\hat{S}^{-1} \hat{\mathbf{q}}(\mathbf{x})$ to produce estimated posterior $\hat{\mathbf{y}}(\mathbf{x})$.
  • Figure 5: Results on Estimating $S$ for Synthetic datasets: comparison of Gramian based approach (Algorithm $1$; this paper) vs. three Baseline approaches-- (1) calibrated classifier; (2) MC dropout and (3) Bayesian neural network (BNN) ensemble. Algorithm $1$ outperforms the baselines in all cases in Scenario A, outperforms all but the BNN in Scenario B. We note, BNN only outperforms Algorithm 1 after very long training times (Scenario C).
  • ...and 7 more figures

Theorems & Definitions (19)

  • Definition 1: Posterior Class Probability Distribution
  • Definition 2: Class Collision
  • Definition 3: Collision Matrix
  • Definition 4: Bayes Optimal Classifier
  • Definition 5: Probabilistic Bayes Classifier
  • Proposition 1: Properties of $S$
  • Definition 6: Collision Divergence
  • Theorem 1: Uniqueness of Row Gramian Factorization
  • Theorem : Gerschgorin Circle Theorem
  • Lemma 1
  • ...and 9 more