Table of Contents
Fetching ...

A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

Hossein Mirzaei, Mojtaba Nafez, Moein Madadi, Arad Maleki, Mahdi Hajialilue, Zeinab Sadat Taghavi, Sepehr Rezaee, Ali Ansari, Bahar Dibaei Nia, Kian Shamsaie, Mohammadreza Salehi, Mackenzie W. Mathis, Mahdieh Soleymani Baghshah, Mohammad Sabokrou, Mohammad Hossein Rohban

TL;DR

This work tackles novelty detection under style shifts, where test-time style variations dilute discriminative core features. It introduces a data-centric pipeline that crafts an auxiliary OOD set by distorting core regions identified through Grad-CAM, and trains a teacher-student model with an OOD-aware contrastive loss to prioritize core features over style cues. The method yields substantial improvements in both robust and standard ND performance across diverse real-world and synthetic datasets, without relying on environment metadata. These results demonstrate practical robustness to style shifts, with broad applicability to safety-critical domains such as autonomous driving and medical imaging. The approach also includes thorough ablations, validating the effectiveness of each component and offering a scalable framework for meta-data-free robust ND.

Abstract

There have been several efforts to improve Novelty Detection (ND) performance. However, ND methods often suffer significant performance drops under minor distribution shifts caused by changes in the environment, known as style shifts. This challenge arises from the ND setup, where the absence of out-of-distribution (OOD) samples during training causes the detector to be biased toward the dominant style features in the in-distribution (ID) data. As a result, the model mistakenly learns to correlate style with core features, using this shortcut for detection. Robust ND is crucial for real-world applications like autonomous driving and medical imaging, where test samples may have different styles than the training data. Motivated by this, we propose a robust ND method that crafts an auxiliary OOD set with style features similar to the ID set but with different core features. Then, a task-based knowledge distillation strategy is utilized to distinguish core features from style features and help our model rely on core features for discriminating crafted OOD and ID sets. We verified the effectiveness of our method through extensive experimental evaluations on several datasets, including synthetic and real-world benchmarks, against nine different ND methods.

A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

TL;DR

This work tackles novelty detection under style shifts, where test-time style variations dilute discriminative core features. It introduces a data-centric pipeline that crafts an auxiliary OOD set by distorting core regions identified through Grad-CAM, and trains a teacher-student model with an OOD-aware contrastive loss to prioritize core features over style cues. The method yields substantial improvements in both robust and standard ND performance across diverse real-world and synthetic datasets, without relying on environment metadata. These results demonstrate practical robustness to style shifts, with broad applicability to safety-critical domains such as autonomous driving and medical imaging. The approach also includes thorough ablations, validating the effectiveness of each component and offering a scalable framework for meta-data-free robust ND.

Abstract

There have been several efforts to improve Novelty Detection (ND) performance. However, ND methods often suffer significant performance drops under minor distribution shifts caused by changes in the environment, known as style shifts. This challenge arises from the ND setup, where the absence of out-of-distribution (OOD) samples during training causes the detector to be biased toward the dominant style features in the in-distribution (ID) data. As a result, the model mistakenly learns to correlate style with core features, using this shortcut for detection. Robust ND is crucial for real-world applications like autonomous driving and medical imaging, where test samples may have different styles than the training data. Motivated by this, we propose a robust ND method that crafts an auxiliary OOD set with style features similar to the ID set but with different core features. Then, a task-based knowledge distillation strategy is utilized to distinguish core features from style features and help our model rely on core features for discriminating crafted OOD and ID sets. We verified the effectiveness of our method through extensive experimental evaluations on several datasets, including synthetic and real-world benchmarks, against nine different ND methods.

Paper Structure

This paper contains 39 sections, 8 equations, 11 figures, 18 tables, 1 algorithm.

Figures (11)

  • Figure 1: Evaluating Robust Novelty Detection Performance: A Comparative Study on the Cityscapes and GTA5 datasets, which both have similar core features but exhibit different style features. Each method has been trained on ID samples from the Cityscapes training dataset, and its performance has been reported on the test sets of Cityscapes (Blue bar) and GTA5 (Orange bar). This highlights the superior performance of our method in contrast to existing methods, which suffer from considerable performance drops. Comprehensive results are provided in Table \ref{['tab:Main_table']}.
  • Figure 2: Comparison of causal graphs: Our method, by intervening on $X_E$ and $X_C$, reduces the unwanted spurious correlation between $X_E$ and $Y$. Note that the graph in (b) depicts an ideal intervention where full independence between $X_E$ and $Y$ is achieved, which might not fully capture real-world complexities.
  • Figure 3: Overview of our framework for robust novelty detection: (A) Generation of an auxiliary OOD set by distorting core features of ID samples. (B) Architecture of the proposed pipeline featuring a pre-trained encoder (teacher) and a from-scratch encoder (student), both concatenated to a linear layer. (C) Training step aims to align the output of the student $f_s(\cdot)$ closely with the teacher’s output $f_t(\cdot)$ for $x_{\text{ID}}^1$ and $x_{\text{ID}}^2$, and to differentiate them for $x_{\text{OOD}}^1$ and $x_{\text{OOD}}^2$. (D) Green circles indicate pairs where the student's output is intended to be close to the teacher's output, red circles indicate pairs that are meant to diverge, and gray squares represent pairs that have been omitted from the loss function.
  • Figure 4: Examples of Datasets Used in the Study: This figure illustrates the concept of Style Shift in data. We have selected the Brain Tumor Dataset, Waterbirds, MVTecAD, Camelyon17 and Colored MNIST, which perfectly highlight our point. In each row, the left section illustrates 4 images corresponding to the training set of the main dataset, i.e., $\mathcal{D}^\text{train}$. The middle section corresponds to the test set of the same dataset, i.e., $\mathcal{D}^\text{test}$. The right section corresponds to the samples from the dataset containing style shift, i.e., $\mathcal{D}'$. In the test datasets (middle and right sections), the OOD samples contain a red frame, only for the sake of readability in the figure. Please note that these frames are not available in the actual data. In the brain tumor datasets, images containing a tumor are labeled as OOD and healthy brains are labeled ID, as shown in the figure. The brain images from the main dataset, all include their skulls, which represents itself as a curve around the brain. On the other hand, the images from the shifted dataset do not possess skulls (which could have been removed as a preprocessing procedure). This can lead to the model mistakenly learning the skull as an ID feature, thus labeling all images from the shifted dataset as OOD. In the second row, we consider the waterbirds dataset, which is fully explained in Appendix \ref{['appendix:bench synthetic datasets']}. In this row, land birds represent ID data and water birds correspond to OOD. In the main dataset (the 2 leftmost columns), the background of all images is a land scenery. In the shifted dataset, all images possess a water background (e.g., sea, lake, etc.). The goal here is to train a model that is robust to the background shifts, and labels images with respect to their foreground, i.e., the type of the bird. In the third row, we consider hazelnut class of the MVTecAD dataset. In this class, non-broken hazelnuts are considered ID, and broken ones are OOD. For the shifted dataset, following the procedure explained for generating synthetic shifted pairs in Appendix \ref{['appendix:syth_dist_shift']}, we apply light augmentations on the background of the image, thus simulating a shift in the style, where the style feature here is the background color. Finally, we have the Camelyon17 dataset, which is a lymph node section dataset explained in Appendix \ref{['natural_dist_shift']}. In this set, the ID class represents healthy patients, and the OOD class represents patients with cancerous cells. The shifted dataset has the exact same settings, but the images are taken in a different center, thus facing minor shifts due to difference in equipment, angle, etc. The shift can be seen in the figure as slight changes in the color for both ID and OOD groups, i.e., the shifted images generally have a darker color complex.Note that the Colored MNIST dataset is displayed for intuition only.
  • Figure 5: Performance and Loss Comparison Across Different Setups on the Cityscapes Dataset: Figure (a) showcases the AUROC curves for four setups, highlighting that Setup C (Ours) not only converges more rapidly but also achieves superior performance relative to the others. Figure (b) presents the normalized loss, where Setup C demonstrates a notably stable loss profile. In contrast, Setups A and B display less stability, with fluctuations in their loss metrics. These comparisons underscore the efficiency and robustness of our approach in both performance and stability.
  • ...and 6 more figures