Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

Yuli Slavutsky; Yuval Benjamini

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

Yuli Slavutsky, Yuval Benjamini

TL;DR

This work tackles the challenge of class distribution shifts in zero-shot learning when the shifting attribute is unknown. It develops a parametric model showing that ERM can fail even when $P(z|c)$ is unchanged, and introduces a robust learning framework that builds diverse synthetic environments via hierarchical sampling and enforces cross-environment balance using a performance-based penalty (VarAUC). Empirical results on simulations and real datasets demonstrate improved generalization to shifted class distributions without sacrificing in-distribution performance, with statistically significant gains on CelebA and ETHEC. The approach reframes class distribution shifts as an OOD environment-balancing problem in zero-shot settings, offering a practical route to more robust open-world verification systems.

Abstract

Zero-shot learning methods typically assume that the new, unseen classes encountered during deployment come from the same distribution as the the classes in the training set. However, real-world scenarios often involve class distribution shifts (e.g., in age or gender for person identification), posing challenges for zero-shot classifiers that rely on learned representations from training classes. In this work, we propose and analyze a model that assumes that the attribute responsible for the shift is unknown in advance. We show that in this setting, standard training may lead to non-robust representations. To mitigate this, we develop an algorithm for learning robust representations in which (a) synthetic data environments are constructed via hierarchical sampling, and (b) environment balancing penalization, inspired by out-of-distribution problems, is applied. We show that our algorithm improves generalization to diverse class distributions in both simulations and experiments on real-world datasets.

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

TL;DR

This work tackles the challenge of class distribution shifts in zero-shot learning when the shifting attribute is unknown. It develops a parametric model showing that ERM can fail even when

is unchanged, and introduces a robust learning framework that builds diverse synthetic environments via hierarchical sampling and enforces cross-environment balance using a performance-based penalty (VarAUC). Empirical results on simulations and real datasets demonstrate improved generalization to shifted class distributions without sacrificing in-distribution performance, with statistically significant gains on CelebA and ETHEC. The approach reframes class distribution shifts as an OOD environment-balancing problem in zero-shot settings, offering a practical route to more robust open-world verification systems.

Abstract

Paper Structure (30 sections, 2 theorems, 57 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 2 theorems, 57 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Problem Setup
Background on Environment Balancing Methods in OOD Generalization
Invariant risk minimization (IRM)
Calibration Loss Over Environments (CLOvE)
Variance Risk Extrapolation (VarREx)
Parametric Model of Class Distribution Shifts in Zero-Shot Learning
Proposed Approach
Synthetic Environments
Environment Balancing Algorithm for Class Distribution Shifts
Balancing Performance Instead of Loss
How Many Environments Are Needed?
Empirical Results
Simulations: Revisiting the Parametric Model
Experiments on Real Data
...and 15 more sections

Key Result

Proposition 1

Consider a weight representation $g(z)=Wz$, where $W \in\mathbb{R}^{d \times d}$is a diagonal matrix, and the squared Euclidean distance $d_{g}\left(z_{i},z_{j}\right)=\left\Vert W\left(z_{i}-z_{j}\right)\right\Vert ^{2}$. Let $W^* =\text{diag}(w^*) \in \arg \min_{W} \mathbb{E}\left[\widetilde{\ell}

Figures (11)

Figure 1: Illustration of the parametric model. Classes of each type are best separated along specific axes: classes of type $a_1$ along the red axis ($z^{(1)}$) and classes of type $a_2$ along the green axis ($z^{(2)}$). On axis $z^{(0)}$ both types can be separated but not as effectively as on their respective optimal axes.
Figure 2: Optimal weights. Top row: $d_0$ is fixed, $d_1$ and $d_2$ vary. Middle and bottom rows: $d_0, d_1, d_2$ are fixed. Middle: $\nu_0/\nu^-$ varies. Bottom: $\nu_0/\nu^+$ varies.
Figure 3: Illustration of the proposed hierarchical sampling. Top: $N_c=6$ classes, with 2 minority-type classes D, F (in purple). Middle: synthetic environments formed by sampling small ($k=3$) class subsets; in $1/5$ of the environments, minority-type classes become the majority constituting $2/3$ of the classes. Bottom: sampling $r=1$ positive and $r=1$ negative pairs for each class in the environment.
Figure 4: Average AUC over 10 simulation repetitions for majority attribute proportion $\rho=0.9$ in training (and 0.1 in test). Solid lines: distribution-shift. Dashed lines: in-distribution. Our method improves robustness for shifts, without compromising training distribution results.
Figure 5: Average feature importance for $\rho=0.9$, 10 repetitions. Our VarAUC penalty favors shared features (blocks 1 and 3), while deprioritizing majority features (block 2). All methods assign low weight to noise features (block 4).
...and 6 more figures

Theorems & Definitions (4)

Proposition 1
Lemma 1
proof
proof

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

TL;DR

Abstract

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (4)