Table of Contents
Fetching ...

Familiarity-Based Open-Set Recognition Under Adversarial Attacks

Philip Enevoldsen, Christian Gundersen, Nico Lang, Serge Belongie, Christian Igel

TL;DR

This paper investigates the vulnerability of familiarity-based open-set recognition scores, particularly MLS and MSP, to gradient-based adversarial perturbations. It distinguishes False Familiarity and False Novelty attacks and analyzes their effectiveness under informed and uninformed settings on TinyImageNet, while introducing the Adversarial Reaction Score as a potential OSR metric. The findings show that MLS can be easily manipulated, with informed attacks capable of reversing OSR rankings and iterative methods delivering strongest disruption; ARS offers limited improvement over MLS. The work highlights the need for robust scoring rules and targeted defense strategies to ensure reliable OSR performance in adversarial contexts, with implications for real-world deployment where novelty detection is critical.

Abstract

Open-set recognition (OSR), the identification of novel categories, can be a critical component when deploying classification models in real-world applications. Recent work has shown that familiarity-based scoring rules such as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are strong baselines when the closed-set accuracy is high. However, one of the potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we study gradient-based adversarial attacks on familiarity scores for both types of attacks, False Familiarity and False Novelty attacks, and evaluate their effectiveness in informed and uninformed settings on TinyImageNet. Furthermore, we explore how novel and familiar samples react to adversarial attacks and formulate the adversarial reaction score as an alternative OSR scoring rule, which shows a high correlation with the MLS familiarity score.

Familiarity-Based Open-Set Recognition Under Adversarial Attacks

TL;DR

This paper investigates the vulnerability of familiarity-based open-set recognition scores, particularly MLS and MSP, to gradient-based adversarial perturbations. It distinguishes False Familiarity and False Novelty attacks and analyzes their effectiveness under informed and uninformed settings on TinyImageNet, while introducing the Adversarial Reaction Score as a potential OSR metric. The findings show that MLS can be easily manipulated, with informed attacks capable of reversing OSR rankings and iterative methods delivering strongest disruption; ARS offers limited improvement over MLS. The work highlights the need for robust scoring rules and targeted defense strategies to ensure reliable OSR performance in adversarial contexts, with implications for real-world deployment where novelty detection is critical.

Abstract

Open-set recognition (OSR), the identification of novel categories, can be a critical component when deploying classification models in real-world applications. Recent work has shown that familiarity-based scoring rules such as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are strong baselines when the closed-set accuracy is high. However, one of the potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we study gradient-based adversarial attacks on familiarity scores for both types of attacks, False Familiarity and False Novelty attacks, and evaluate their effectiveness in informed and uninformed settings on TinyImageNet. Furthermore, we explore how novel and familiar samples react to adversarial attacks and formulate the adversarial reaction score as an alternative OSR scoring rule, which shows a high correlation with the MLS familiarity score.
Paper Structure (11 sections, 7 equations, 7 figures)

This paper contains 11 sections, 7 equations, 7 figures.

Figures (7)

  • Figure 1: Adversarial attacks on OSR familiarity scores. Considering novel categories as positives, the top box depicts a false positive (FP) attack that lowers the familiarity of the known category leading to a false novelty (FNov). In contrast, the bottom box indicates a false negative (FN) attack that increases the familiarity of a known category leading to a missed novelty or false familiarity (FFam).
  • Figure 2: Qualitative example. Perturbed images (top) and adversarial perturbations (bottom).
  • Figure 3: Uninformed FGSM attacks. Fast gradient sign method (FGSM) attacks on TinyImageNet. Left: False Familiarity (FFam) attacks. Right: False Novelty (FNov) attacks. (a,b) The OSR ranking measured by AUROC. (c,d) Median Maximum Logit Score (MLS) of all samples (familiar and novel).
  • Figure 4: Informed FGSM attacks. Fast gradient sign method (FGSM) attacks on TinyImageNet. Left: False Familiarity (FFam) attacks. Right: False Novelty (FNov) attacks. (a,b) The OSR ranking measured by AUROC. (c,d) Median Maximum Logit Score (MLS) of novel samples (c) and familiar samples (d).
  • Figure 5: MLS for familiar and novel samples after adversarial perturbation. FGSM attacks on TinyImageNet. Median Maximum Logit Score (MLS) w.r.t. original scores separately for the familiar and novel samples. The filled region shows the 25th and 75th quantile. (a) False Familiarity (FFam) attacks with the max loss. (b) False Novelty (FNov) attacks with the 2-norm loss. These were the objectives that could push the scores most up or down w.r.t. the original scores, for FFam and FNov, respectively (see Fig. \ref{['subfig:fn_mls']}, \ref{['subfig:fp_mls']}).
  • ...and 2 more figures