Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

Yijie Wang; Mingjian Hong; Luwen Huangfu; Sheng Huang

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

Yijie Wang, Mingjian Hong, Luwen Huangfu, Sheng Huang

TL;DR

This work tackles bias toward seen data in generalized zero-shot learning by recasting GZSL as an end-to-end problem that jointly models in-distribution and out-of-distribution data. It introduces a novel $D^3GZSL$ framework comprising Feature Generation (FG), In-Distribution Dual-Space Distillation (ID$^2$SD), and Out-of-Distribution Batch Distillation (O$^2$DBD); the framework optimizes a combined objective that includes $\\mathcal{L}_{gen}$, $\\mathcal{L}_{id}$, and $\\mathcal{L}_{od}$. ID$^2$SD aligns teacher–student distributions in embedding and label spaces, while O$^2$DBD learns a low-dimensional OOD representation per batch and models cross-sample correlations to capture shared structure between seen and unseen classes. Empirical results on four GZSL benchmarks show consistent improvements over strong generative baselines, and the approach remains compatible with GAN, VAE, and diffusion-based generators, highlighting its practical impact for robust zero-shot recognition.

Abstract

In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modules: in-distribution dual space distillation (ID$^2$SD) and out-of-distribution batch distillation (O$^2$DBD). ID$^2$SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O$^2$DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D$^3$GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

TL;DR

framework comprising Feature Generation (FG), In-Distribution Dual-Space Distillation (ID

SD), and Out-of-Distribution Batch Distillation (O

DBD); the framework optimizes a combined objective that includes

, and

. ID

SD aligns teacher–student distributions in embedding and label spaces, while O

DBD learns a low-dimensional OOD representation per batch and models cross-sample correlations to capture shared structure between seen and unseen classes. Empirical results on four GZSL benchmarks show consistent improvements over strong generative baselines, and the approach remains compatible with GAN, VAE, and diffusion-based generators, highlighting its practical impact for robust zero-shot recognition.

Abstract

GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D

GZSL comprises two core modules: in-distribution dual space distillation (ID

SD) and out-of-distribution batch distillation (O

DBD). ID

SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O

DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D

GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git

Paper Structure (17 sections, 12 equations, 3 figures, 4 tables)

This paper contains 17 sections, 12 equations, 3 figures, 4 tables.

Introduction
Related Work
Methodology
Problem Statement
D$^3$-GZSL Framework
Feature Generation(FG).
In-Distribution Dual-Space Distillation (ID$^2$SD).
Out-of-Distribution Batch Distillation (O$^2$DBD).
Model Optimization
Experiment
Comparisons with Previous Methods
Ablation Study
Training Strategy Analysis.
Component Analysis.
OOD Scoring Strategy Analysis.
...and 2 more sections

Figures (3)

Figure 1: A schematic view of the bias concerning seen classes (source) in the visual space.
Figure 2: Two-stage classification method based on OOD detection. Stage one: OOD detector performs binary classification of the input data into seen and unseen categories. Stage two: Two expert classifiers separately classify the samples that the Out-Of-Distribution (OOD) detector identifies as seen and unseen categories.
Figure 3: The structure of our D$^3$GZSL framework. The FG is our baseline model, which is a generative ZSL method. In ID$^2$SD, we learn two embedding function $E_o$ and $E_s$ that map the visual samples $x$ into the embedding space as $z=E(x)$. $C_o$ and $C_s$ are the classifier networks of the teacher and student architectures, respectively. $f$ is a softmax function. In O$^2$DBD, $O$ is OOD scoring method. $H$ is a mapping function that maps the softmax probability of student network to the OOD representation embedding space. $S$ is the transformation of out-of-distribution detection scores into OOD representation space.

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

TL;DR

Abstract

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (3)