DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
Kyuheon Jung, Yongdeuk Seo, Seongwoo Cho, Jaeyoung Kim, Hyun-seok Min, Sungchul Choi
TL;DR
DALDA tackles data scarcity by jointly leveraging LLMs to enrich text prompts with class-specific semantic information and diffusion-based image synthesis, guided by CLIPScore to adaptively balance image- and text-driven cues. The core contribution is Adaptive Guidance Scaling (AGS), implemented via IP-Adapter cross-attention, which selects the text vs. image emphasis through a truncated-normal sampling of the guidance weight $\\lambda$ based on per-sample CLIPScore. Empirical results on HC and LC few-shot benchmarks show increased synthetic-data diversity and improved downstream accuracies, with strongest gains when using LLM-generated prompts and AGS, while maintaining adherence to the target distribution in challenging low-CLIPScore cases. The work advances practical data augmentation by avoiding extra diffusion-model fine-tuning and providing a principled mechanism to regulate diversity versus distributional fidelity, offering a scalable approach for real-world, data-scarce tasks.
Abstract
In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios. Recently, DMs have opened up the possibility of generating synthetic images to complement a few training images. However, increasing the diversity of synthetic images also raises the risk of generating samples outside the target distribution. Our approach addresses this issue by embedding novel semantic information into text prompts via LLM and utilizing real images as visual prompts, thus generating semantically rich images. To ensure that the generated images remain within the target distribution, we dynamically adjust the guidance weight based on each image's CLIPScore to control the diversity. Experimental results show that our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution. Consequently, our approach proves to be more efficient in the few-shot setting on several benchmarks. Our code is available at https://github.com/kkyuhun94/dalda .
