Table of Contents
Fetching ...

Adversarial Robustness in Zero-Shot Learning:An Empirical Study on Class and Concept-Level Vulnerabilities

Zhiyuan Peng, Zihan Ye, Shreyank N Gowda, Yuping Yan, Haotian Xu, Ling Shao

TL;DR

This work systematically studies adversarial robustness in zero-shot learning, revealing vulnerabilities at both the class and concept levels. It uncovers a spurious GZSL vulnerability under standard class attacks due to calibration, and introduces Class-Bias Enhancing Attack (CBEA) to collapse GZSL across calibration points. The authors further propose concept-focused attacks (CPconA and NCPconA) that disrupt intermediate concept predictions while evaluating across multiple architectures and datasets. Overall, the paper highlights a critical robustness gap in current ZSL methods and argues for robust defenses and standardized adversarial benchmarks to ensure trustworthy ZSL deployment.

Abstract

Zero-shot Learning (ZSL) aims to enable image classifiers to recognize images from unseen classes that were not included during training. Unlike traditional supervised classification, ZSL typically relies on learning a mapping from visual features to predefined, human-understandable class concepts. While ZSL models promise to improve generalization and interpretability, their robustness under systematic input perturbations remain unclear. In this study, we present an empirical analysis about the robustness of existing ZSL methods at both classlevel and concept-level. Specifically, we successfully disrupted their class prediction by the well-known non-target class attack (clsA). However, in the Generalized Zero-shot Learning (GZSL) setting, we observe that the success of clsA is only at the original best-calibrated point. After the attack, the optimal bestcalibration point shifts, and ZSL models maintain relatively strong performance at other calibration points, indicating that clsA results in a spurious attack success in the GZSL. To address this, we propose the Class-Bias Enhanced Attack (CBEA), which completely eliminates GZSL accuracy across all calibrated points by enhancing the gap between seen and unseen class probabilities.Next, at concept-level attack, we introduce two novel attack modes: Class-Preserving Concept Attack (CPconA) and NonClass-Preserving Concept Attack (NCPconA). Our extensive experiments evaluate three typical ZSL models across various architectures from the past three years and reveal that ZSL models are vulnerable not only to the traditional class attack but also to concept-based attacks. These attacks allow malicious actors to easily manipulate class predictions by erasing or introducing concepts. Our findings highlight a significant performance gap between existing approaches, emphasizing the need for improved adversarial robustness in current ZSL models.

Adversarial Robustness in Zero-Shot Learning:An Empirical Study on Class and Concept-Level Vulnerabilities

TL;DR

This work systematically studies adversarial robustness in zero-shot learning, revealing vulnerabilities at both the class and concept levels. It uncovers a spurious GZSL vulnerability under standard class attacks due to calibration, and introduces Class-Bias Enhancing Attack (CBEA) to collapse GZSL across calibration points. The authors further propose concept-focused attacks (CPconA and NCPconA) that disrupt intermediate concept predictions while evaluating across multiple architectures and datasets. Overall, the paper highlights a critical robustness gap in current ZSL methods and argues for robust defenses and standardized adversarial benchmarks to ensure trustworthy ZSL deployment.

Abstract

Zero-shot Learning (ZSL) aims to enable image classifiers to recognize images from unseen classes that were not included during training. Unlike traditional supervised classification, ZSL typically relies on learning a mapping from visual features to predefined, human-understandable class concepts. While ZSL models promise to improve generalization and interpretability, their robustness under systematic input perturbations remain unclear. In this study, we present an empirical analysis about the robustness of existing ZSL methods at both classlevel and concept-level. Specifically, we successfully disrupted their class prediction by the well-known non-target class attack (clsA). However, in the Generalized Zero-shot Learning (GZSL) setting, we observe that the success of clsA is only at the original best-calibrated point. After the attack, the optimal bestcalibration point shifts, and ZSL models maintain relatively strong performance at other calibration points, indicating that clsA results in a spurious attack success in the GZSL. To address this, we propose the Class-Bias Enhanced Attack (CBEA), which completely eliminates GZSL accuracy across all calibrated points by enhancing the gap between seen and unseen class probabilities.Next, at concept-level attack, we introduce two novel attack modes: Class-Preserving Concept Attack (CPconA) and NonClass-Preserving Concept Attack (NCPconA). Our extensive experiments evaluate three typical ZSL models across various architectures from the past three years and reveal that ZSL models are vulnerable not only to the traditional class attack but also to concept-based attacks. These attacks allow malicious actors to easily manipulate class predictions by erasing or introducing concepts. Our findings highlight a significant performance gap between existing approaches, emphasizing the need for improved adversarial robustness in current ZSL models.

Paper Structure

This paper contains 21 sections, 20 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration for the classification process of (a) traditional Supervised Learning (SL), (b) Zero-Shot Learning (ZSL) (c) and Generalized Zero-Shot Learning (GZSL). Different to traditional SL, ZSL methods classify visual images through intermediate concept prediction, i.e. $Visual \to Concept \to Class$. Besides, methods often produce biased class scores in GZSL. Thus, a class calibration technique chao2016empirical is often employed to debias class scores, which uses a pre-defined hyper-parameter $\gamma$ to manually reduce scores of seen classes.
  • Figure 2: Two groups of adversarial examples and corresponding noises produced by our clsA and CBEA. The produced malicious noises are imperceptible. Moreover, we can find that our CBEA can attack successfully with fewer noise update.
  • Figure 3: The AUSUC comparison about our plain clsA and CEBA on PSVMA model. We can find that the adversarial examples produced from the plain clsA still remain a large accuracy area, while our CEBA effectively removes almost accuracy area with enough noise update.
  • Figure 4: The example about our NCPconA-10 and CPconA-10. Minor noise on the images can also cause drastic changes in concept prediction, regardless of whether class prediction is affected or not.