ChimeraLoRA: Multi-Head LoRA-Guided Synthetic Datasets
Hoyoung Kim, Minwoo Jang, Jabin Koo, Sangdoo Yun, Jungseul Ok
TL;DR
ChimeraLoRA addresses data scarcity in specialized and long-tailed domains by unifying class-level priors and image-specific details through a multi-head LoRA framework. It uses a shared adapter $A$ for class priors and per-image adapters $\mathcal{B}$, augmented by semantic boosting with bounding boxes, and generates images by merging heads via weights $w\sim\text{Dirichlet}(\boldsymbol{\alpha})$ to form $B'$. The method demonstrates improved downstream accuracy and a reduced synthetic-to-real gap across diverse datasets, including medical and long-tail tasks, while using fewer trainable parameters than baselines. These results suggest practical viability for few-shot learning regimes where data collection is constrained, enabling more robust and diverse synthetic datasets for training. The approach offers a principled way to balance fidelity and diversity in diffusion-model–based data augmentation and highlights opportunities for extending semantic-aware augmentation with soft labels or per-semantic sampling.
Abstract
Beyond general recognition tasks, specialized domains including privacy-constrained medical applications and fine-grained settings often encounter data scarcity, especially for tail classes. To obtain less biased and more reliable models under such scarcity, practitioners leverage diffusion models to supplement underrepresented regions of real data. Specifically, recent studies fine-tune pretrained diffusion models with LoRA on few-shot real sets to synthesize additional images. While an image-wise LoRA trained on a single image captures fine-grained details yet offers limited diversity, a class-wise LoRA trained over all shots produces diverse images as it encodes class priors yet tends to overlook fine details. To combine both benefits, we separate the adapter into a class-shared LoRA~$A$ for class priors and per-image LoRAs~$\mathcal{B}$ for image-specific characteristics. To expose coherent class semantics in the shared LoRA~$A$, we propose a semantic boosting by preserving class bounding boxes during training. For generation, we compose $A$ with a mixture of $\mathcal{B}$ using coefficients drawn from a Dirichlet distribution. Across diverse datasets, our synthesized images are both diverse and detail-rich while closely aligning with the few-shot real distribution, yielding robust gains in downstream classification accuracy.
