Table of Contents
Fetching ...

Estimation of Confidence Bounds in Binary Classification using Wilson Score Kernel Density Estimation

Thorbjørn Mosekjær Iversen, Zebin Duan, Frederik Hagelskjær

TL;DR

This work presents Wilson Score Kernel Density Classification, which is a novel kernel-based method for estimating confidence bounds in binary classification, and shows similar performance to Gaussian Process Classification, but at a lower computational complexity.

Abstract

The performance and ease of use of deep learning-based binary classifiers have improved significantly in recent years. This has opened up the potential for automating critical inspection tasks, which have traditionally only been trusted to be done manually. However, the application of binary classifiers in critical operations depends on the estimation of reliable confidence bounds such that system performance can be ensured up to a given statistical significance. We present Wilson Score Kernel Density Classification, which is a novel kernel-based method for estimating confidence bounds in binary classification. The core of our method is the Wilson Score Kernel Density Estimator, which is a function estimator for estimating confidence bounds in Binomial experiments with conditionally varying success probabilities. Our method is evaluated in the context of selective classification on four different datasets, illustrating its use as a classification head of any feature extractor, including vision foundation models. Our proposed method shows similar performance to Gaussian Process Classification, but at a lower computational complexity.

Estimation of Confidence Bounds in Binary Classification using Wilson Score Kernel Density Estimation

TL;DR

This work presents Wilson Score Kernel Density Classification, which is a novel kernel-based method for estimating confidence bounds in binary classification, and shows similar performance to Gaussian Process Classification, but at a lower computational complexity.

Abstract

The performance and ease of use of deep learning-based binary classifiers have improved significantly in recent years. This has opened up the potential for automating critical inspection tasks, which have traditionally only been trusted to be done manually. However, the application of binary classifiers in critical operations depends on the estimation of reliable confidence bounds such that system performance can be ensured up to a given statistical significance. We present Wilson Score Kernel Density Classification, which is a novel kernel-based method for estimating confidence bounds in binary classification. The core of our method is the Wilson Score Kernel Density Estimator, which is a function estimator for estimating confidence bounds in Binomial experiments with conditionally varying success probabilities. Our method is evaluated in the context of selective classification on four different datasets, illustrating its use as a classification head of any feature extractor, including vision foundation models. Our proposed method shows similar performance to Gaussian Process Classification, but at a lower computational complexity.
Paper Structure (17 sections, 5 equations, 6 figures, 1 table)

This paper contains 17 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the Wilson Score Kernel Density Classifier when used for image classification. The image is fed to a feature extractor, e.g. a vision foundation model, and mapped to a lower-dimensional space. Confidence bounds for the class probability are computed from a set of labeled training samples using the Wilson Score Kernel Density Estimator.
  • Figure 2: Intuitive illustration of WS-KDE. (a) The Wilson Score method could in principle be used for estimating confidence bounds by binning the feature space and treating the samples in each bin as a part of separate Binomial tests. (b) WS-KDE provides a more elegant solution by combining the Wilson Score method with kernel smoothing.
  • Figure 3: As part of an automated assembly, the robot inserts a part into a fixture. A vision-based classifier is used to infer in the insertion was successful, before the assembly process can continue.
  • Figure 4: Precision/Recall reject plots for four of the experiments. In plots b and c ResNet18 is used as the feature extractor, while plot d uses Dinov3. In b-d dimensionality reduction is done with UMAP. The lines indicate the mean, and the shaded area indicates the 5% and 95% quantiles over 50 repeated experiments. The vertical lines indicate at which coverage the corresponding lower confidence bound is at $\tau = 95\%$. The precision to the right of this line must be above $\tau$ for the classifier to be reliable.
  • Figure 5: Plot of the mapping from ResNet18 classification confidences to the confidence bounds estimated by the WS-KDC and GPC during a single run of the Cats & Dogs ResNet18 + fc (1k) experiment.
  • ...and 1 more figures