HiGFA: Hierarchical Guidance for Fine-grained Data Augmentation with Diffusion Models
Zhiguang Lu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang
TL;DR
HiGFA tackles FGVC data scarcity by integrating three guidance streams—text prompts for global diversity, transformed contour maps for structure, and a fine-grained classifier for category fidelity—within a diffusion model. By leveraging the diffusion process’s coarse-to-fine generation, HiGFA uses a dynamic, confidence-aware scheduling that activates fine-grained guidance only when needed, preserving diversity while maintaining fidelity. Empirical results across six FGVC benchmarks, including few-shot settings and ViT backbones, show consistent improvements over traditional augmentations and prior diffusion-based methods. The work demonstrates that hierarchical, adaptive guidance can produce high-quality, diverse FGVC synthetic data, boosting downstream classifier performance.
Abstract
Generative diffusion models show promise for data augmentation. However, applying them to fine-grained tasks presents a significant challenge: ensuring synthetic images accurately capture the subtle, category-defining features critical for high fidelity. Standard approaches, such as text-based Classifier-Free Guidance (CFG), often lack the required specificity, potentially generating misleading examples that degrade fine-grained classifier performance. To address this, we propose Hierarchically Guided Fine-grained Augmentation (HiGFA). HiGFA leverages the temporal dynamics of the diffusion sampling process. It employs strong text and transformed contour guidance with fixed strengths in the early-to-mid sampling stages to establish overall scene, style, and structure. In the final sampling stages, HiGFA activates a specialized fine-grained classifier guidance and dynamically modulates the strength of all guidance signals based on prediction confidence. This hierarchical, confidence-driven orchestration enables HiGFA to generate diverse yet faithful synthetic images by intelligently balancing global structure formation with precise detail refinement. Experiments on several FGVC datasets demonstrate the effectiveness of HiGFA.
