Large-Scale Evaluation of Open-Set Image Classification Techniques

Halil Bisgin; Andres Palechor; Mike Suter; Manuel Günther

Large-Scale Evaluation of Open-Set Image Classification Techniques

Halil Bisgin, Andres Palechor, Mike Suter, Manuel Günther

TL;DR

This work addresses the need for realistic, large-scale evaluation of open-set image classification. It systematically compares training-based OSC losses (SoftMax, Garbage, EOS) with post-processing methods (MSS, MLS, OpenMax, EVM, PROSER) across three ImageNet-based protocols that vary the semantic distance between known and unknown classes. Key findings show that Entropic Open-Set (EOS) training generally improves discrimination of negatives and unknowns, and that hybrid approaches (e.g., EOS with OpenMax or PROSER) yield the strongest gains in settings with semantically distant unknowns, while performance is more mixed for harder unknowns. The work provides reproducible code and framing to benchmark OSC methods fairly at scale, guiding future development toward robust open-set recognition in real-world deployments.

Abstract

The goal for classification is to correctly assign labels to unseen samples. However, most methods misclassify samples with unseen labels and assign them to one of the known classes. Open-Set Classification (OSC) algorithms aim to maximize both closed and open-set recognition capabilities. Recent studies showed the utility of such algorithms on small-scale data sets, but limited experimentation makes it difficult to assess their performances in real-world problems. Here, we provide a comprehensive comparison of various OSC algorithms, including training-based (SoftMax, Garbage, EOS) and post-processing methods (Maximum SoftMax Scores, Maximum Logit Scores, OpenMax, EVM, PROSER), the latter are applied on features from the former. We perform our evaluation on three large-scale protocols that mimic real-world challenges, where we train on known and negative open-set samples, and test on known and unknown instances. Our results show that EOS helps to improve performance of almost all post-processing algorithms. Particularly, OpenMax and PROSER are able to exploit better-trained networks, demonstrating the utility of hybrid models. However, while most algorithms work well on negative test samples -- samples of open-set classes seen during training -- they tend to perform poorly when tested on samples of previously unseen unknown classes, especially in challenging conditions.

Large-Scale Evaluation of Open-Set Image Classification Techniques

TL;DR

Abstract

Paper Structure (24 sections, 14 equations, 6 figures, 11 tables)

This paper contains 24 sections, 14 equations, 6 figures, 11 tables.

Introduction
Open-Set Classification Taxonomy
Training-based Methods
Post-processing Methods
Evaluated Methods
Training-based Methods
Post-processing Methods
Data Set and Evaluation Protocols
Evaluation Protocols
Evaluation Metric
Single-Valued Evaluation Metric
Comparison to Other Metrics
Experiments
Network Training
Hyperparameter Optimization
...and 9 more sections

Figures (6)

Figure 1: Open-Set Protocols for ImageNet. This figure shows the partition of classes into known, negative and unknown within the three different protocols, $P_{1}$, $P_{2}$, and $P_{3}$. By following the WordNet hierarchy miller1998wordnet which is shown with the dashed lines indicating an "is-a" relationship, we sample our final classes from the leaf nodes of the intermediate-level superclasses named above the colored bars. The colored bars below indicate that its subclasses are sampled for the same color codes representing knowns, negatives and unknowns. For example, all subclasses of "Dog" are used as known classes in $P_{1}$, while the subclasses of "Hunting Dog" are partitioned into knowns and negatives in $P_{2}$. On the other hand, $P_{3}$ has several intermediate nodes that are partitioned into known, negative, and unknown classes. Those partitions also constitute training, validation, and test sets in each protocol. While known and negative classes are available during training and validation, unknown classes only appear in test time. More details on the protocols are provided by palechor2023protocols.
Figure 2: Processing with Deep Networks. An image is presented to the backbone network, which extracts deep features $\vec{\varphi}$ that are then processed with a Linear layer to logits $\vec{z}$, and further with SoftMax to probabilities $\vec{y}$.
Figure 3: OSCR Plots. This figure shows OSCR plots for negative and unknown test samples on all three protocols, split across training-based methods. Colors separate post-processing methods.
Figure 4: Score Distributions. This figure shows score distributions extracted from the network trained with three different loss functions, and further post-processed with all algorithms (except for MLS), on protocol $P_{1}$.
Figure 5: Score Distributions. This figure shows score distributions extracted from the network trained with various loss functions on \ref{['fig:distributions:P2']} Protocol $P_{2}$ and \ref{['fig:distributions:P3']} protocol $P_{3}$, and further processed with all algorithms (except for MLS).
...and 1 more figures

Large-Scale Evaluation of Open-Set Image Classification Techniques

TL;DR

Abstract

Large-Scale Evaluation of Open-Set Image Classification Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (6)