Table of Contents
Fetching ...

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

Yijie Wang, Mingjian Hong, Luwen Huangfu, Sheng Huang

TL;DR

This work tackles bias toward seen data in generalized zero-shot learning by recasting GZSL as an end-to-end problem that jointly models in-distribution and out-of-distribution data. It introduces a novel $D^3GZSL$ framework comprising Feature Generation (FG), In-Distribution Dual-Space Distillation (ID$^2$SD), and Out-of-Distribution Batch Distillation (O$^2$DBD); the framework optimizes a combined objective that includes $\\mathcal{L}_{gen}$, $\\mathcal{L}_{id}$, and $\\mathcal{L}_{od}$. ID$^2$SD aligns teacher–student distributions in embedding and label spaces, while O$^2$DBD learns a low-dimensional OOD representation per batch and models cross-sample correlations to capture shared structure between seen and unseen classes. Empirical results on four GZSL benchmarks show consistent improvements over strong generative baselines, and the approach remains compatible with GAN, VAE, and diffusion-based generators, highlighting its practical impact for robust zero-shot recognition.

Abstract

In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modules: in-distribution dual space distillation (ID$^2$SD) and out-of-distribution batch distillation (O$^2$DBD). ID$^2$SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O$^2$DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D$^3$GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

TL;DR

This work tackles bias toward seen data in generalized zero-shot learning by recasting GZSL as an end-to-end problem that jointly models in-distribution and out-of-distribution data. It introduces a novel framework comprising Feature Generation (FG), In-Distribution Dual-Space Distillation (IDSD), and Out-of-Distribution Batch Distillation (ODBD); the framework optimizes a combined objective that includes , , and . IDSD aligns teacher–student distributions in embedding and label spaces, while ODBD learns a low-dimensional OOD representation per batch and models cross-sample correlations to capture shared structure between seen and unseen classes. Empirical results on four GZSL benchmarks show consistent improvements over strong generative baselines, and the approach remains compatible with GAN, VAE, and diffusion-based generators, highlighting its practical impact for robust zero-shot recognition.

Abstract

In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called DGZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. DGZSL comprises two core modules: in-distribution dual space distillation (IDSD) and out-of-distribution batch distillation (ODBD). IDSD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. ODBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that DGZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git
Paper Structure (17 sections, 12 equations, 3 figures, 4 tables)

This paper contains 17 sections, 12 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A schematic view of the bias concerning seen classes (source) in the visual space.
  • Figure 2: Two-stage classification method based on OOD detection. Stage one: OOD detector performs binary classification of the input data into seen and unseen categories. Stage two: Two expert classifiers separately classify the samples that the Out-Of-Distribution (OOD) detector identifies as seen and unseen categories.
  • Figure 3: The structure of our D$^3$GZSL framework. The FG is our baseline model, which is a generative ZSL method. In ID$^2$SD, we learn two embedding function $E_o$ and $E_s$ that map the visual samples $x$ into the embedding space as $z=E(x)$. $C_o$ and $C_s$ are the classifier networks of the teacher and student architectures, respectively. $f$ is a softmax function. In O$^2$DBD, $O$ is OOD scoring method. $H$ is a mapping function that maps the softmax probability of student network to the OOD representation embedding space. $S$ is the transformation of out-of-distribution detection scores into OOD representation space.