Table of Contents
Fetching ...

Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation

Minho Park, Sunghyun Park, Jungsoo Lee, Hyojin Park, Kyuwoong Hwang, Fatih Porikli, Jaegul Choo, Sungha Choi

TL;DR

This work tackles data scarcity in semantic segmentation by generating labeled data with text-to-image models and addressing two core challenges: domain alignment and informativeness. It introduces Concept-Aware LoRA (CA-LoRA), a selective fine-tuning method that updates only concept-relevant weights to align generated imagery with the target domain while preserving pretrained knowledge to maintain diversity. CA-LoRA relies on concept sensitivity, computed as a gradient-based ratio between concept loss and diffusion loss, to identify which projection weights correspond to desired concepts like viewpoint or style. Experiments on urban-scene segmentation show CA-LoRA achieving state-of-the-art performance in both in-domain few-shot and fully supervised settings and in domain-generalization tasks, with efficient training and improved image-label alignment, illustrating its practical impact for scalable, robust dataset generation.

Abstract

This paper addresses the challenge of data scarcity in semantic segmentation by generating datasets through text-to-image (T2I) generation models, reducing image acquisition and labeling costs. Segmentation dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. Fine-tuning T2I models can help generate samples aligned with the target domain. However, it often overfits and memorizes training data, limiting their ability to generate diverse and well-aligned samples. To overcome these issues, we propose Concept-Aware LoRA (CA-LoRA), a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts (e.g., style or viewpoint) for domain alignment while preserving the pretrained knowledge of the T2I model to produce informative samples. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain (few-shot and fully-supervised) settings, as well as in domain generalization tasks, especially under challenging conditions such as adverse weather and varying illumination, further highlighting its superiority.

Concept-Aware LoRA for Domain-Aligned Segmentation Dataset Generation

TL;DR

This work tackles data scarcity in semantic segmentation by generating labeled data with text-to-image models and addressing two core challenges: domain alignment and informativeness. It introduces Concept-Aware LoRA (CA-LoRA), a selective fine-tuning method that updates only concept-relevant weights to align generated imagery with the target domain while preserving pretrained knowledge to maintain diversity. CA-LoRA relies on concept sensitivity, computed as a gradient-based ratio between concept loss and diffusion loss, to identify which projection weights correspond to desired concepts like viewpoint or style. Experiments on urban-scene segmentation show CA-LoRA achieving state-of-the-art performance in both in-domain few-shot and fully supervised settings and in domain-generalization tasks, with efficient training and improved image-label alignment, illustrating its practical impact for scalable, robust dataset generation.

Abstract

This paper addresses the challenge of data scarcity in semantic segmentation by generating datasets through text-to-image (T2I) generation models, reducing image acquisition and labeling costs. Segmentation dataset generation faces two key challenges: 1) aligning generated samples with the target domain and 2) producing informative samples beyond the training data. Fine-tuning T2I models can help generate samples aligned with the target domain. However, it often overfits and memorizes training data, limiting their ability to generate diverse and well-aligned samples. To overcome these issues, we propose Concept-Aware LoRA (CA-LoRA), a novel fine-tuning approach that selectively identifies and updates only the weights associated with necessary concepts (e.g., style or viewpoint) for domain alignment while preserving the pretrained knowledge of the T2I model to produce informative samples. We demonstrate its effectiveness in generating datasets for urban-scene segmentation, outperforming baseline and state-of-the-art methods in in-domain (few-shot and fully-supervised) settings, as well as in domain generalization tasks, especially under challenging conditions such as adverse weather and varying illumination, further highlighting its superiority.

Paper Structure

This paper contains 65 sections, 10 equations, 22 figures, 16 tables.

Figures (22)

  • Figure 1: Motivation of Concept-Aware LoRA (CA-LoRA). Pretrained T2I models generate informative images but struggle with viewpoint alignment. LoRA fine-tuning on Cityscapes enables driving-viewpoint generation but leads to overfitting to the Cityscapes style and content. We aim to learn only the desired concept (e.g., viewpoint) for generating domain-aligned, informative samples.
  • Figure 2: Overview of the proposed framework for generating an urban-scene segmentation dataset by learning the Cityscapes viewpoint. The process consists of four stages: (1) identifying sensitive weights for a specific concept, (2) selectively fine-tuning them with LoRA, (3) training a label generator using features from T2I model, and (4) generating diverse image-label pairs with augmented prompts.
  • Figure 3: Overview of measuring concept sensitivity. (a) We design the concept loss ($\mathcal{L}_\text{Concept}$) with the concept-augmented captions ($c_\text{Aug}$), and the original diffusion loss ($\mathcal{L}_\text{Diffusion}$) with the added noise $\epsilon$. The concept-augmented captions can be changed according to the desired concept (e.g., style, viewpoint). (b) While each concept gradient represents the reaction of the concept, it has to be normalized with the original diffusion gradient to assess the increased ratio of each layer.
  • Figure 4: Illustration of CA-LoRA. Unlike the original LoRA, our CA-LoRA selectively attaches LoRA layers in a specified proportion to projection layers sensitive to the desired concept.
  • Figure 5: Qualitative comparison of image-label pairs between DatasetDM, Original LoRA, and our CA-LoRAs in a few-shot setting (Cityscapes, 0.3%). The pretrained model misaligns with the viewpoint and style of the target domain, while the original LoRA memorizes training examples. In contrast, CA-LoRA selectively learns either style or viewpoint concepts from the source dataset.
  • ...and 17 more figures