Table of Contents
Fetching ...

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

Milad Abdollahzadeh, Guimeng Liu, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Ngai-Man Cheung

TL;DR

This survey tackles Generative Modeling under Data Constraint (GM-DC), addressing how GANs, VAEs, and diffusion models perform under limited, few-shot, and zero-shot data. It introduces two novel taxonomies—one for GM-DC tasks and another for GM-DC approaches—and provides a comprehensive review of 230+ papers, complemented by a Sankey diagram to map task-approach-method interactions. The work highlights core challenges (overfitting, frequency bias, distant-domain transfer, evaluation) and synthesizes practical recommendations across transfer learning, data augmentation, architectural design, multi-task objectives, frequency-aware methods, meta-learning, and internal patch distribution modeling. It also outlines future directions, including leveraging foundation models, robust zero-shot grounding, distant-domain transfer, holistic evaluation, and data-centric strategies, aiming to guide researchers and practitioners in advancing GM-DC."

Abstract

Generative modeling in machine learning aims to synthesize new data samples that are statistically similar to those observed during training. While conventional generative models such as GANs and diffusion models typically assume access to large and diverse datasets, many real-world applications (e.g. in medicine, satellite imaging, and artistic domains) operate under limited data availability and strict constraints. In this survey, we examine Generative Modeling under Data Constraint (GM-DC), which includes limited-data, few-shot, and zero-shot settings. We present a unified perspective on the key challenges in GM-DC, including overfitting, frequency bias, and incompatible knowledge transfer, and discuss how these issues impact model performance. To systematically analyze this growing field, we introduce two novel taxonomies: one categorizing GM-DC tasks (e.g. unconditional vs. conditional generation, cross-domain adaptation, and subject-driven modeling), and another organizing methodological approaches (e.g. transfer learning, data augmentation, meta-learning, and frequency-aware modeling). Our study reviews over 230 papers, offering a comprehensive view across generative model types and constraint scenarios. We further analyze task-approach-method interactions using a Sankey diagram and highlight promising directions for future work, including adaptation of foundation models, holistic evaluation frameworks, and data-centric strategies for sample selection. This survey provides a timely and practical roadmap for researchers and practitioners aiming to advance generative modeling under limited data. Project website: https://sutd-visual-computing-group.github.io/gmdc-survey/.

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

TL;DR

This survey tackles Generative Modeling under Data Constraint (GM-DC), addressing how GANs, VAEs, and diffusion models perform under limited, few-shot, and zero-shot data. It introduces two novel taxonomies—one for GM-DC tasks and another for GM-DC approaches—and provides a comprehensive review of 230+ papers, complemented by a Sankey diagram to map task-approach-method interactions. The work highlights core challenges (overfitting, frequency bias, distant-domain transfer, evaluation) and synthesizes practical recommendations across transfer learning, data augmentation, architectural design, multi-task objectives, frequency-aware methods, meta-learning, and internal patch distribution modeling. It also outlines future directions, including leveraging foundation models, robust zero-shot grounding, distant-domain transfer, holistic evaluation, and data-centric strategies, aiming to guide researchers and practitioners in advancing GM-DC."

Abstract

Generative modeling in machine learning aims to synthesize new data samples that are statistically similar to those observed during training. While conventional generative models such as GANs and diffusion models typically assume access to large and diverse datasets, many real-world applications (e.g. in medicine, satellite imaging, and artistic domains) operate under limited data availability and strict constraints. In this survey, we examine Generative Modeling under Data Constraint (GM-DC), which includes limited-data, few-shot, and zero-shot settings. We present a unified perspective on the key challenges in GM-DC, including overfitting, frequency bias, and incompatible knowledge transfer, and discuss how these issues impact model performance. To systematically analyze this growing field, we introduce two novel taxonomies: one categorizing GM-DC tasks (e.g. unconditional vs. conditional generation, cross-domain adaptation, and subject-driven modeling), and another organizing methodological approaches (e.g. transfer learning, data augmentation, meta-learning, and frequency-aware modeling). Our study reviews over 230 papers, offering a comprehensive view across generative model types and constraint scenarios. We further analyze task-approach-method interactions using a Sankey diagram and highlight promising directions for future work, including adaptation of foundation models, holistic evaluation frameworks, and data-centric strategies for sample selection. This survey provides a timely and practical roadmap for researchers and practitioners aiming to advance generative modeling under limited data. Project website: https://sutd-visual-computing-group.github.io/gmdc-survey/.
Paper Structure (64 sections, 8 equations, 12 figures, 11 tables)

This paper contains 64 sections, 8 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Research Landscape of GM-DC. The figure shows the interaction between GM-DC tasks and approaches (main and sub categories), and representative GM-DC methods. Tasks are defined in our proposed taxonomy in Tab. \ref{['tab:tasktaxonomy']}, and approaches in our proposed taxonomy in Tab. \ref{['tab:approaches']}. An interactive version of this diagram is available at our https://sutd-visual-computing-group.github.io/gmdc-survey/. Best viewed in color and with zoom.
  • Figure 2: Overall publications statistics in GM-DC. GM-DC Publications (Left): GM-DC publication trends indicate rising interest in this area. We remark that the previous survey li2022degan only covers $\sim$13% of publications discussed in our survey. Publication Venues (Right): The distribution of publications in major machine learning and computer vision venues, other venues, and arXiv. Best viewed in color.
  • Figure 3: Analysis of publications in GM-DC. Data Constraints: Different types of data constraints studied in GM-DC. See Sec. \ref{['sec:background']} for more details on setups. Models: Different types of models are studied including Generative Adversarial Network (GAN), Diffusion Model (DM), and Variational Auto-Encoder (VAE). Tasks: Different GM-DC tasks that are studied; See Sec. \ref{['ssec:tasks']}, and Tab. \ref{['tab:tasktaxonomy']} for details on task definitions in our proposed task taxonomy. Approaches: Different approaches that are applied for addressing GM-DC; More details on our proposed taxonomy of approaches can be found in Sec. \ref{['sec:comprehensive_review']} and Tab. \ref{['tab:approaches']}. Best viewed in color.
  • Figure 4: Illustration of the timeline when a GM-DC task/approach was introduced based on our proposed taxonomies: task taxonomy (details in Sec. \ref{['ssec:tasks']}, and Tab. \ref{['tab:tasktaxonomy']}), and approach taxonomy (details in Sec. \ref{['sec:comprehensive_review']}, and Tab. \ref{['tab:approaches']}). Best viewed in color.
  • Figure 5: Source-target domain proximity visualization indicates that distant/ remote target domains have not been explored in GM-DC setups and are very challenging. We use FFHQ karras2019style as the source domain. We show source-target domain proximity qualitatively by visualizing Inception-v3 (Left)szegedy2016labelsmoothing, LPIPS (Middle)zhang2018lpips and DreamSim (Right)fu2023dreamsim features. For feature visualization, we use t-SNE JMLR:v9:vandermaaten08a_tsne and show centroids ($\bigtriangleup$) for all domains. We clearly show using feature visualizations that additional setups -- Flowers nilsback2008oxford_flower and Church yu2015lsun -- represent target domains that are remote from the source domain (FFHQ) compared to target domains used in the literature. This indicates that the exploration of distant/ remote target domains under GM-DC setups has not been pursued and poses notable challenges (Fig. \ref{['fig:proximity-measurements-and-10-shot-flowers']}). Best viewed in color.
  • ...and 7 more figures