Table of Contents
Fetching ...

Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis

Chirag Vashist, Shichong Peng, Ke Li

TL;DR

The paper addresses the challenge of few-shot image synthesis by identifying a latent-space misalignment in IMLE, where training-time latent codes (drawn from $\mathcal{N}(0,I)$) do not match test-time latents. It introduces RS-IMLE, which designs a target prior $\mathcal{P}$ and uses rejection sampling to align training and inference, backed by a theoretical analysis linking misalignment to distance CDFs and an explicit PDF relation involving $\phi(t)$. The approach achieves marked improvements in image quality and mode coverage across nine datasets, with an average $\sim$45.9% FID reduction over strong baselines, and is supported by quantitative metrics and qualitative assessments such as Visual Recall and latent-space interpolation. This method advances few-shot synthesis by enabling better utilization of limited data and offering a practical, scalable prior-design strategy. $\mathcal{N}(0,I)$, $\mathcal{P}$, $m$, $n$, $\epsilon$ are central to the construction and evaluation of the proposed framework.

Abstract

An emerging area of research aims to learn deep generative models with limited training data. Prior generative models like GANs and diffusion models require a lot of data to perform well, and their performance degrades when they are trained on only a small amount of data. A recent technique called Implicit Maximum Likelihood Estimation (IMLE) has been adapted to the few-shot setting, achieving state-of-the-art performance. However, current IMLE-based approaches encounter challenges due to inadequate correspondence between the latent codes selected for training and those drawn during inference. This results in suboptimal test-time performance. We theoretically show a way to address this issue and propose RS-IMLE, a novel approach that changes the prior distribution used for training. This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods, as validated by comprehensive experiments conducted on nine few-shot image datasets.

Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis

TL;DR

The paper addresses the challenge of few-shot image synthesis by identifying a latent-space misalignment in IMLE, where training-time latent codes (drawn from ) do not match test-time latents. It introduces RS-IMLE, which designs a target prior and uses rejection sampling to align training and inference, backed by a theoretical analysis linking misalignment to distance CDFs and an explicit PDF relation involving . The approach achieves marked improvements in image quality and mode coverage across nine datasets, with an average 45.9% FID reduction over strong baselines, and is supported by quantitative metrics and qualitative assessments such as Visual Recall and latent-space interpolation. This method advances few-shot synthesis by enabling better utilization of limited data and offering a practical, scalable prior-design strategy. , , , , are central to the construction and evaluation of the proposed framework.

Abstract

An emerging area of research aims to learn deep generative models with limited training data. Prior generative models like GANs and diffusion models require a lot of data to perform well, and their performance degrades when they are trained on only a small amount of data. A recent technique called Implicit Maximum Likelihood Estimation (IMLE) has been adapted to the few-shot setting, achieving state-of-the-art performance. However, current IMLE-based approaches encounter challenges due to inadequate correspondence between the latent codes selected for training and those drawn during inference. This results in suboptimal test-time performance. We theoretically show a way to address this issue and propose RS-IMLE, a novel approach that changes the prior distribution used for training. This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods, as validated by comprehensive experiments conducted on nine few-shot image datasets.
Paper Structure (22 sections, 9 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 9 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: IMLE is an implicit generative model that maps a latent code sampled from a prior distribution to an image output. In previous IMLE-based methods, both the training and testing phases adopt a standard normal distribution as the prior distribution. However, this approach often results in poor generalization during inference. To address this limitation, we introduce RS-IMLE, which uses rejection sampling to alter the prior distribution used for training to a different distribution $\mathcal{P}$. This modification significantly enhances the quality of generated images during testing.
  • Figure 2: Difference between the latent codes picked by IMLE and RS-IMLE over the course of training.
  • Figure 3: Comparison between performance of diffusion models on large-scale and few-shot setting. We have two 2D datasets of the same shape (infinity symbol) but different number of data points: 10K data points \ref{['fig-diffusion:10k']} and 20 data points \ref{['fig-diffusion:20']}. We train the same model but get very different performance. For the few-shot case (20 data points), the diffusion model fails to learn a distribution that matches the data distribution. Data points are denoted by ${\color{ForestGreen}\mdblksquare}$ and samples are denoted by ${\color{circlecolor}\mdblkcircle}$.
  • Figure 4: Illustrative figure for demonstrating the behaviour of $F_{{D}^{*}_{i}}(t)$ and $F_{{D}_{i1}}(t)$ using Noncentral Chi-squared distribution as the example distribution.
  • Figure 5: Comparison between IMLE and RS-IMLE for 2D toy problem. Data points are denoted by ${\color{ForestGreen}\mdblksquare}$ and samples are denoted by ${\color{circlecolor}\mdblkcircle}$. Samples picked as nearest neighbours are denoted by $\color{blue} \bigstar$.
  • ...and 6 more figures