Table of Contents
Fetching ...

GHOST: Gaussian Hypothesis Open-Set Technique

Ryan Rabinowitz, Steve Cruz, Manuel Günther, Terrance E. Boult

TL;DR

Open-set recognition on large-scale data often relies on global metrics that obscure per-class fairness. GHOST introduces a hyperparameter-free approach that models each known class with a per-class Gaussian in embedding space and uses a logit-normalized, monotone open-set score to detect unknowns. Across ImageNet-1K and multiple OOD/OSR datasets, GHOST delivers state-of-the-art performance on AUOSCR, AUROC, and FPR95 while reducing per-class performance variance, highlighting improved fairness. The method is simple, scalable, and accompanied by code, offering a practical, fair, and effective solution for large-scale OSR.

Abstract

Evaluations of large-scale recognition methods typically focus on overall performance. While this approach is common, it often fails to provide insights into performance across individual classes, which can lead to fairness issues and misrepresentation. Addressing these gaps is crucial for accurately assessing how well methods handle novel or unseen classes and ensuring a fair evaluation. To address fairness in Open-Set Recognition (OSR), we demonstrate that per-class performance can vary dramatically. We introduce Gaussian Hypothesis Open Set Technique (GHOST), a novel hyperparameter-free algorithm that models deep features using class-wise multivariate Gaussian distributions with diagonal covariance matrices. We apply Z-score normalization to logits to mitigate the impact of feature magnitudes that deviate from the model's expectations, thereby reducing the likelihood of the network assigning a high score to an unknown sample. We evaluate GHOST across multiple ImageNet-1K pre-trained deep networks and test it with four different unknown datasets. Using standard metrics such as AUOSCR, AUROC and FPR95, we achieve statistically significant improvements, advancing the state-of-the-art in large-scale OSR. Source code is provided online.

GHOST: Gaussian Hypothesis Open-Set Technique

TL;DR

Open-set recognition on large-scale data often relies on global metrics that obscure per-class fairness. GHOST introduces a hyperparameter-free approach that models each known class with a per-class Gaussian in embedding space and uses a logit-normalized, monotone open-set score to detect unknowns. Across ImageNet-1K and multiple OOD/OSR datasets, GHOST delivers state-of-the-art performance on AUOSCR, AUROC, and FPR95 while reducing per-class performance variance, highlighting improved fairness. The method is simple, scalable, and accompanied by code, offering a practical, fair, and effective solution for large-scale OSR.

Abstract

Evaluations of large-scale recognition methods typically focus on overall performance. While this approach is common, it often fails to provide insights into performance across individual classes, which can lead to fairness issues and misrepresentation. Addressing these gaps is crucial for accurately assessing how well methods handle novel or unseen classes and ensuring a fair evaluation. To address fairness in Open-Set Recognition (OSR), we demonstrate that per-class performance can vary dramatically. We introduce Gaussian Hypothesis Open Set Technique (GHOST), a novel hyperparameter-free algorithm that models deep features using class-wise multivariate Gaussian distributions with diagonal covariance matrices. We apply Z-score normalization to logits to mitigate the impact of feature magnitudes that deviate from the model's expectations, thereby reducing the likelihood of the network assigning a high score to an unknown sample. We evaluate GHOST across multiple ImageNet-1K pre-trained deep networks and test it with four different unknown datasets. Using standard metrics such as AUOSCR, AUROC and FPR95, we achieve statistically significant improvements, advancing the state-of-the-art in large-scale OSR. Source code is provided online.

Paper Structure

This paper contains 25 sections, 7 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Class-wise Open-Set Recognition. OSCR comparison using the MAE-H architecture with OpenImage-O as unknowns. Overall performance is the solid line; Average performance on easy (top 10%) and hard (bottom 10%) classes shown as dashed/dotted lines, respectively. We compare GHOST with Maximum Softmax Probability (MSP) and NNGuide. Also, we show the area under the curve (AUC) of each method's overall OSCR. GHOST outperforms in each setting and maintains its correct classification rate as the FPR rate decreases while others fall off dramatically; hence, GHOST maintains fairness in difficult cases while improving overall OSR.
  • Figure 2: GHOST Scores. In a pre-trained network indicated with solid arrows, an image is presented to the backbone network, which extracts deep feature embeddings $\vec{\varphi}$ that are then processed with a Linear layer to logits $\vec{z}$, and further with SoftMax to probabilities $\vec{y}$. For training GHOST, we extract embeddings from training data, from which we model class-wise multivariate Gaussian distributions. During evaluation, the Gaussian of the predicted class are used to turn the embeddings $\vec{\varphi}$ into z-score, which is used together with the maximum logit $z_{\hat{k}}$ to compute the GHOST score $\gamma$.
  • Figure 3: GHOST modeling of a Multivariate Gaussian per Class. Samples of Gaussians from the MAE-H network are shown on the left, sampled once every 30 dimensions. Dimensions were sorted on mean value to improve visibility, and the spread shows how some dimensions have greater variance than others. The plot also shows the value of per-dimension z-scores associated with a correctly classified hammerhead image (known in green) and an OOD example with a shark (red) misclassified as a hammerhead. The z-scores of the OOD example are much larger than those of the known. More examples in supplemental material.
  • Figure 4: Unfairness (Coefficient of Variation). This figure shows the unfairness of OSR algorithms across False Positive Rates using MAE-H network with OpenImages as unknowns. All algorithms include the inherent unfairness from the base classifier on the far right, but GHOST maintains its level much better as FPR rates are decreased to the left.
  • Figure 5: OSCR in Logscale. In applications with the high cost of false-positives or those with many potential unknowns, it is more important to focus on low FPR performance, in which case log FPR as shown here are more useful. The global performance is presented as a solid line, while top-10 % is dashed, and bottom-10 % is dotted. In all cases, GHOST is significantly better at low FPR levels, and below FPR of 0.1 GHOST's bottom-10 % performance is better than most algorithms' top-10 %.
  • ...and 6 more figures