A method for classification of data with uncertainty using hypothesis testing

Shoma Yokura; Akihisa Ichiki

A method for classification of data with uncertainty using hypothesis testing

Shoma Yokura, Akihisa Ichiki

TL;DR

The paper tackles overconfident binary classification in overlapping and out-of-distribution regions by introducing a two-type hypothesis-testing framework that leverages an empirical distribution of training-time discriminant features. The method uses the test statistic $t = g(x)$ and sets decision thresholds via the $α$-quantiles, enabling explicit detection of ambiguous and out-of-distribution data without resampling or model modification. Through spiral-pattern benchmarking and pneumonia detection with DenseNet-121, it demonstrates uncertainty visualization via acceptance regions and shows how varying $α$ controls the trade-off between coverage and predictive accuracy. The approach offers a principled, low-cost alternative for uncertainty-aware decisions in high-stakes settings and is applicable to domains beyond medical imaging.

Abstract

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that belong to overlapping regions of the two class distributions or for data outside the distributions (out-of-distribution data). Therefore, conventional classifiers should not be applied in high-risk fields where classification results can have significant consequences. In order to address this issue, it is necessary to quantify uncertainty and adopt decision-making approaches that take it into account. Many methods have been proposed for this purpose; however, implementing these methods often requires performing resampling, improving the structure or performance of models, and optimizing the thresholds of classifiers. We propose a new decision-making approach using two types of hypothesis testing. This method is capable of detecting ambiguous data that belong to the overlapping regions of two class distributions, as well as out-of-distribution data that are not included in the training data distribution. In addition, we quantify uncertainty using the empirical distribution of feature values derived from the training data obtained through the trained model. The classification threshold is determined by the $α$-quantile and ($1-α$)-quantile, where the significance level $α$ is set according to each specific situation.

A method for classification of data with uncertainty using hypothesis testing

TL;DR

Abstract

A method for classification of data with uncertainty using hypothesis testing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)