Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation
Minho Park, Sunghyun Park, Jungsoo Lee, Hyojin Park, Kyuwoong Hwang, Fatih Porikli, Jaegul Choo, Sungha Choi
TL;DR
This work tackles data scarcity in semantic segmentation by generating labeled data with text-to-image models and addressing two core challenges: domain alignment and informativeness. It introduces Concept-Aware LoRA (CA-LoRA), a selective fine-tuning method that updates only concept-relevant weights to align generated imagery with the target domain while preserving pretrained knowledge to maintain diversity. CA-LoRA relies on concept sensitivity, computed as a gradient-based ratio between concept loss and diffusion loss, to identify which projection weights correspond to desired concepts like viewpoint or style. Experiments on urban-scene segmentation show CA-LoRA achieving state-of-the-art performance in both in-domain few-shot and fully supervised settings and in domain-generalization tasks, with efficient training and improved image-label alignment, illustrating its practical impact for scalable, robust dataset generation.
Abstract
This paper addresses the challenge of data scarcity in semantic segmentation by generating datasets through text-to-image (T2I) generation models, reducing image acquisition and labeling costs. Segmentation dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. Fine-tuning T2I models can help generate samples aligned with the target domain. However, it often overfits and memorizes training data, limiting their ability to generate diverse and well-aligned samples. To overcome these issues, we propose Concept-Aware LoRA (CA-LoRA), a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts (e.g., style or viewpoint) for domain alignment while preserving the pretrained knowledge of the T2I model to produce informative samples. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain (few-shot and fully-supervised) settings, as well as in domain generalization tasks, especially under challenging conditions such as adverse weather and varying illumination, further highlighting its superiority.
