Taking Class Imbalance Into Account in Open Set Recognition Evaluation

Joanna Komorniczak; Pawel Ksieniewicz

Taking Class Imbalance Into Account in Open Set Recognition Evaluation

Joanna Komorniczak, Pawel Ksieniewicz

TL;DR

Open Set Recognition (OSR) faces evaluation challenges under real-world imbalances between known (kkc) and unknown (uuc) classes. The authors analyze how standard metrics and evaluation protocols distort OSR performance as Openness and $kkc$/$uuc$ distributions vary, and propose extending the evaluation with four metrics—Inner, Outer, Halfpoint, and Overall—derived from Balanced Accuracy. Through experiments with discriminative and generative baselines on CIFAR10/SVHN and MNIST/Omniglot configurations, they show that metric choice and data imbalance critically influence conclusions, with generative methods often excelling under the Outer/Overall measures while discriminative methods perform better on the Inner score. The paper delivers practical guidelines for robust OSR evaluation in imbalanced, open-world settings, advocating multiple configurations, repetition, and the Halfpoint/Overall metrics to penalize false unknowns and better reflect real-world performance.

Abstract

In recent years Deep Neural Network-based systems are not only increasing in popularity but also receive growing user trust. However, due to the closed-world assumption of such systems, they cannot recognize samples from unknown classes and often induce an incorrect label with high confidence. Presented work looks at the evaluation of methods for Open Set Recognition, focusing on the impact of class imbalance, especially in the dichotomy between known and unknown samples. As an outcome of problem analysis, we present a set of guidelines for evaluation of methods in this field.

Taking Class Imbalance Into Account in Open Set Recognition Evaluation

TL;DR

distributions vary, and propose extending the evaluation with four metrics—Inner, Outer, Halfpoint, and Overall—derived from Balanced Accuracy. Through experiments with discriminative and generative baselines on CIFAR10/SVHN and MNIST/Omniglot configurations, they show that metric choice and data imbalance critically influence conclusions, with generative methods often excelling under the Outer/Overall measures while discriminative methods perform better on the Inner score. The paper delivers practical guidelines for robust OSR evaluation in imbalanced, open-world settings, advocating multiple configurations, repetition, and the Halfpoint/Overall metrics to penalize false unknowns and better reflect real-world performance.

Abstract

Paper Structure (14 sections, 1 equation, 6 figures, 1 table)

This paper contains 14 sections, 1 equation, 6 figures, 1 table.

Introduction
Open Set Recognition
Class taxonomy
Contribution and motivation
Related works
Class Imbalance in Open Set Recognition
Common evaluation strategies
Base metric impact on measured quality
Proposed experimental protocol extension
Exemplary experimental evaluation
Datasets
Method configuration and evaluation measures
Analysis of obtained results
Conclusions

Figures (6)

Figure 1: Number of samples coming from kkc and uuc in test set in relation to the dataset Openness (function of kkc and uuc number)
Figure 2: Histograms of measure values of random predictions for different configurations of kkc/uuc class numbers
Figure 3: The general confusion matrix divided into specific regions subject to individual metrics
Figure 4: Examples of derived confusion matrices based on the general matrix in Figure \ref{['fig:matrix1']}
Figure 5: An example of randomly selected samples of Training and Testing sets, where red digits come from kkc and blue from uuc
...and 1 more figures

Taking Class Imbalance Into Account in Open Set Recognition Evaluation

TL;DR

Abstract

Taking Class Imbalance Into Account in Open Set Recognition Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)