Table of Contents
Fetching ...

Improved Feature Generating Framework for Transductive Zero-shot Learning

Zihan Ye, Xinyuan Ru, Shiming Chen, Yaochu Jin, Kaizhu Huang, Xiaobo Jin

TL;DR

This work tackles transductive zero-shot learning (TZSL) by identifying the unconditional unseen discriminator as a key source of prior-bias-induced degradation. It introduces I-VAEGAN, which pairs Pseudo-conditional Feature Adversarial (PFA) learning with Variational Embedding Regression (VER) to mitigate prior bias and improve semantic regression, respectively. Through a three-stage training regime and extensive experiments on AWA1/2, CUB, and SUN, I-VAEGAN achieves state-of-the-art TZSL and TGZSL performance across diverse unseen-class priors, while reducing Accumulated Prior Error. The proposed approach offers robust performance under unknown priors and demonstrates compatibility with existing TZSL frameworks, signaling practical impact for zero-shot recognition tasks where unseen priors are uncertain.

Abstract

Feature Generative Adversarial Networks have emerged as powerful generative models in producing high-quality representations of unseen classes within the scope of Zero-shot Learning (ZSL). This paper delves into the pivotal influence of unseen class priors within the framework of transductive ZSL (TZSL) and illuminates the finding that even a marginal prior bias can result in substantial accuracy declines. Our extensive analysis uncovers that this inefficacy fundamentally stems from the utilization of an unconditional unseen discriminator - a core component in existing TZSL. We further establish that the detrimental effects of this component are inevitable unless the generator perfectly fits class-specific distributions. Building on these insights, we introduce our Improved Feature Generation Framework, termed I-VAEGAN, which incorporates two novel components: Pseudo-conditional Feature Adversarial (PFA) learning and Variational Embedding Regression (VER). PFA circumvents the need for prior estimation by explicitly injecting the predicted semantics as pseudo conditions for unseen classes premised by precise semantic regression. Meanwhile, VER utilizes reconstructive pre-training to learn class statistics, obtaining better semantic regression. Our I-VAEGAN achieves state-of-the-art TZSL accuracy across various benchmarks and priors. Our code would be released upon acceptance.

Improved Feature Generating Framework for Transductive Zero-shot Learning

TL;DR

This work tackles transductive zero-shot learning (TZSL) by identifying the unconditional unseen discriminator as a key source of prior-bias-induced degradation. It introduces I-VAEGAN, which pairs Pseudo-conditional Feature Adversarial (PFA) learning with Variational Embedding Regression (VER) to mitigate prior bias and improve semantic regression, respectively. Through a three-stage training regime and extensive experiments on AWA1/2, CUB, and SUN, I-VAEGAN achieves state-of-the-art TZSL and TGZSL performance across diverse unseen-class priors, while reducing Accumulated Prior Error. The proposed approach offers robust performance under unknown priors and demonstrates compatibility with existing TZSL frameworks, signaling practical impact for zero-shot recognition tasks where unseen priors are uncertain.

Abstract

Feature Generative Adversarial Networks have emerged as powerful generative models in producing high-quality representations of unseen classes within the scope of Zero-shot Learning (ZSL). This paper delves into the pivotal influence of unseen class priors within the framework of transductive ZSL (TZSL) and illuminates the finding that even a marginal prior bias can result in substantial accuracy declines. Our extensive analysis uncovers that this inefficacy fundamentally stems from the utilization of an unconditional unseen discriminator - a core component in existing TZSL. We further establish that the detrimental effects of this component are inevitable unless the generator perfectly fits class-specific distributions. Building on these insights, we introduce our Improved Feature Generation Framework, termed I-VAEGAN, which incorporates two novel components: Pseudo-conditional Feature Adversarial (PFA) learning and Variational Embedding Regression (VER). PFA circumvents the need for prior estimation by explicitly injecting the predicted semantics as pseudo conditions for unseen classes premised by precise semantic regression. Meanwhile, VER utilizes reconstructive pre-training to learn class statistics, obtaining better semantic regression. Our I-VAEGAN achieves state-of-the-art TZSL accuracy across various benchmarks and priors. Our code would be released upon acceptance.

Paper Structure

This paper contains 43 sections, 2 theorems, 28 equations, 9 figures, 8 tables, 1 algorithm.

Key Result

Proposition 3.1

When G gets the expected global optimum of the minimax game, i.e., $p_r(\mathbf{x}^{u}) = p_g(\mathbf{x}^{u})$, for any unseen class $y^u_{i}$, where $e(\mathbf{x}^{u}, y^u_i)$ is the accumulated prior error:

Figures (9)

  • Figure 1: Our discovered prior reaction chain. We find that unseen prior probabilities firstly impact unconditional unseen discriminator $D_{u}$. Then, $D_{u}$ gives insufficient gradient guidance to generator $G$. Next, although unconditional generation distribution $p_g(\mathbf{x}^u)$ could fully fit to real unconditional $p_r(\mathbf{x}^u)$, class-specific distributions still have an inevitable gap. Finally, the ZSL classifier mis-classifies test samples. This chain enlightens us to refine the unconditional $D_{u}$.
  • Figure 2: Illustration of our proposed PFA. (a) Standard unconditional feature adversarial learning can be negatively affected by prior bias and unpaired classes of real and fake samples, while (b) our PFA mitigates these two problems: prior and classes can be matched at the same time, only if $R$ is accurate enough.
  • Figure 3: Illustration of our VER. (a) Plain regression narayan2020latentye2023rebalanced need paired visual features and semantic labels. Thus, it only works on seen classes. (b) Adversarial Regression wang2023bi can work on both seen and unseen classes. (c) Our VER unsupervisedly pre-trains a VAE to model intra-class variations, then uses its embeddings to enhance (a) and (b) seamlessly.
  • Figure 4: The APE comparison for our I-VAEGAN and Bi-VAEGAN on (a) AWA1 and (b) AWA2. Our method effectively reduces APE.
  • Figure 5: The comparison for prior estimation in non-uniform datasets: AWA1 and AWA2.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Proposition 3.1: APE
  • proof
  • Remark 3.2
  • Proposition 1.1: APE
  • proof
  • Remark 1.2