Table of Contents
Fetching ...

Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues

Wenjin Hou, Xiaoxiao Sun, Hehe Fan

Abstract

Recent advances in zero-shot learning (ZSL) have demonstrated the potential of generative models. Typically, generative ZSL synthesizes visual features conditioned on semantic prototypes to model the data distribution of unseen classes, followed by training a classifier on the synthesized data. However, the synthesized features often remain task-agnostic, leading to degraded performance. Moreover, inferring a faithful distribution from semantic prototypes alone is insufficient for classes that are semantically similar but visually distinct. To address these and advance ZSL, we propose RLVC, an outcome-reward reinforcement learning RL framework with visual cues for generative ZSL. At its core, RL empowers the generative model to self-evolve, implicitly enhancing its generation capability. In particular, RLVC updates the generative model using an outcome-based reward, encouraging the synthesis of task-relevant features. Furthermore, we introduce class-wise visual cues that (i) align synthesized features with visual prototypes and (ii) stabilize the RL training updates. For the training process, we present a novel cold-start strategy. Comprehensive experiments and analyses on three prevalent ZSL benchmarks demonstrate that RLVC achieves state-of-the-art results with a 4.7% gain.

Incentivizing Generative Zero-Shot Learning via Outcome-Reward Reinforcement Learning with Visual Cues

Abstract

Recent advances in zero-shot learning (ZSL) have demonstrated the potential of generative models. Typically, generative ZSL synthesizes visual features conditioned on semantic prototypes to model the data distribution of unseen classes, followed by training a classifier on the synthesized data. However, the synthesized features often remain task-agnostic, leading to degraded performance. Moreover, inferring a faithful distribution from semantic prototypes alone is insufficient for classes that are semantically similar but visually distinct. To address these and advance ZSL, we propose RLVC, an outcome-reward reinforcement learning RL framework with visual cues for generative ZSL. At its core, RL empowers the generative model to self-evolve, implicitly enhancing its generation capability. In particular, RLVC updates the generative model using an outcome-based reward, encouraging the synthesis of task-relevant features. Furthermore, we introduce class-wise visual cues that (i) align synthesized features with visual prototypes and (ii) stabilize the RL training updates. For the training process, we present a novel cold-start strategy. Comprehensive experiments and analyses on three prevalent ZSL benchmarks demonstrate that RLVC achieves state-of-the-art results with a 4.7% gain.
Paper Structure (18 sections, 14 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 14 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Motivating illustration. (a) Existing generative ZSL methods train with adversarial losses conditioned only on semantic prototypes. This often leads to task-agnostic synthesized features and inter-class overlap. (b) Our RLVC incentivizes the generative model updating via RL reward and visual cues, enabling synthesized features that remain task-relevant and faithfully represent the data distribution.
  • Figure 2: Model architecture and training of RLVC. The top panel shows how we train the reward model with a visual encoder to produce fine-tuned visual features and reward signals. The bottom panel depicts how we update the policy model $G_\theta$ (i.e., generator) via outcome-reward reinforcement learning (blue arrows) and visual cues (green arrows), enabling synthesized features that remain task-relevant and faithfully represent the data distribution. $\mathbf{x}_{0}$ and $\tilde{\mathbf{x}}_{0}$ denote the real and synthesized features of seen classes, respectively.
  • Figure 3: The training trends of our RLVC on CUB, including raw reward, EMA-adjusted advantage and ZSL accuracy.
  • Figure 4: Qualitative t-SNE visualization of RLVC on CUB: (a) without RL and visual cues, (b) without visual cues, and (c) full RLVC. We use real features of seen classes and synthetic features of unseen classes. Zoom in for details.
  • Figure 5: Effect of hyperparameters on CUB, including the epoch of RL cold-start, the coefficient of visual loss, and the number of synthetic unseen samples.