Robust Novelty Detection through Style-Conscious Feature Ranking

Stefan Smeu; Elena Burceanu; Emanuela Haller; Andrei Liviu Nicolicioiu

Robust Novelty Detection through Style-Conscious Feature Ranking

Stefan Smeu, Elena Burceanu, Emanuela Haller, Andrei Liviu Nicolicioiu

TL;DR

This work tackles robust novelty detection under style/content distribution shifts by introducing Stylist, a two-step feature-ranking method that uses per-feature distribution distances across training environments to identify environment-biased features and drop them, thereby improving ND performance under covariate and sub-population shifts. Stylist operates on top of pretrained representations, is compatible with diverse feature extractors, and yields consistent gains across real (fMoW, DomainNet) and synthetic (COCOShift) benchmarks, with code released for practical use. The authors also provide COCOShift, a controllable synthetic benchmark to analyze spurious correlations between style and content, and show that removing environment-biased features enhances generalization for multiple ND algorithms. Overall, Stylist offers a simple, effective, and interpretable mechanism to reduce reliance on style cues while preserving semantic detection capabilities, enabling more robust and transferable novelty detection in diverse settings.

Abstract

Novelty detection seeks to identify samples deviating from a known distribution, yet data shifts in a multitude of ways, and only a few consist of relevant changes. Aligned with out-of-distribution generalization literature, we advocate for a formal distinction between task-relevant semantic or content changes and irrelevant style changes. This distinction forms the basis for robust novelty detection, emphasizing the identification of semantic changes resilient to style distributional shifts. To this end, we introduce Stylist, a method that utilizes pretrained large-scale model representations to selectively discard environment-biased features. By computing per-feature scores based on feature distribution distances between environments, Stylist effectively eliminates features responsible for spurious correlations, enhancing novelty detection performance. Evaluations on adapted domain generalization datasets and a synthetic dataset demonstrate Stylist's efficacy in improving novelty detection across diverse datasets with stylistic and content shifts. The code is available at https://github.com/bit-ml/Stylist.

Robust Novelty Detection through Style-Conscious Feature Ranking

TL;DR

Abstract

Paper Structure (25 sections, 3 equations, 9 figures, 6 tables)

This paper contains 25 sections, 3 equations, 9 figures, 6 tables.

Introduction
Problem formulation
Our approach
Step 1. Feature ranking in training environments
Step 2. Features selection for Robust Novelty Detection
1. The feature extractor
2. Style changes more between environments
Experimental analysis
Robust Novelty Detection
Ablations
A glimpse of interpretability
Related work
Conclusions
Impact statement
Acknowledgments
...and 10 more sections

Figures (9)

Figure 1: Multi-env setup for the Robust Novelty Detection task.
Figure 2: Stylist. We improve the ND performance by identifying (Step 1) and gradually removing (Step 2) environment-biased features. From this point of view, higher distribution distances between environments proved to be a good indicator for ranking features.
Figure 3: Feature selection algorithms. From left to right on the horizontal axis, we remove features according to the ranking of each feature selection algorithm. As the spuriousness level of the train set increases ($a) \rightarrow b) \rightarrow c)$), the performance of Stylist (in black) increases, while the performance of other methods decreases. This proves that our approach is better at identifying environment-biased features responsible for the spurious correlations. The reported ROC-AUC performance is for the same OOD sets in all three plots.
Figure 4: Dataset spuriousness impact. We vary the train set spuriousness level between style and content for the two steps of our algorithm. a) same dataset for both steps; b) fixed dataset (COCOShift_balanced) for ND training in Step 2; c) fixed dataset (COCOShift_balanced) for Feature ranking in Step 1. Our method always manages to improve the ND performance (w.r.t. all features baseline), even in degenerated cases like 95% (or no) spurious correlation, in only one or in both steps (see the positive slopes in all curves).
Figure 5: Features Selection vs. Dimensionality Reduction (PCA). When comparing Stylist (black) with PCA (orange), we see that Stylist selection works better in all cases. Moreover, when combining the best selection percentage of Stylist with further dimensionality reduction using PCA (green), we observe an improvement (note that the green curve corresponds to different absolute numbers of features).
...and 4 more figures

Robust Novelty Detection through Style-Conscious Feature Ranking

TL;DR

Abstract

Robust Novelty Detection through Style-Conscious Feature Ranking

Authors

TL;DR

Abstract

Table of Contents

Figures (9)