Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis
Chirag Vashist, Shichong Peng, Ke Li
TL;DR
The paper addresses the challenge of few-shot image synthesis by identifying a latent-space misalignment in IMLE, where training-time latent codes (drawn from $\mathcal{N}(0,I)$) do not match test-time latents. It introduces RS-IMLE, which designs a target prior $\mathcal{P}$ and uses rejection sampling to align training and inference, backed by a theoretical analysis linking misalignment to distance CDFs and an explicit PDF relation involving $\phi(t)$. The approach achieves marked improvements in image quality and mode coverage across nine datasets, with an average $\sim$45.9% FID reduction over strong baselines, and is supported by quantitative metrics and qualitative assessments such as Visual Recall and latent-space interpolation. This method advances few-shot synthesis by enabling better utilization of limited data and offering a practical, scalable prior-design strategy. $\mathcal{N}(0,I)$, $\mathcal{P}$, $m$, $n$, $\epsilon$ are central to the construction and evaluation of the proposed framework.
Abstract
An emerging area of research aims to learn deep generative models with limited training data. Prior generative models like GANs and diffusion models require a lot of data to perform well, and their performance degrades when they are trained on only a small amount of data. A recent technique called Implicit Maximum Likelihood Estimation (IMLE) has been adapted to the few-shot setting, achieving state-of-the-art performance. However, current IMLE-based approaches encounter challenges due to inadequate correspondence between the latent codes selected for training and those drawn during inference. This results in suboptimal test-time performance. We theoretically show a way to address this issue and propose RS-IMLE, a novel approach that changes the prior distribution used for training. This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods, as validated by comprehensive experiments conducted on nine few-shot image datasets.
