Table of Contents
Fetching ...

CAT: Contrastive Adapter Training for Personalized Image Generation

Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song

TL;DR

This work presents Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss, which facilitates the preservation of the base model's original knowledge when the model initiates adapters.

Abstract

The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.

CAT: Contrastive Adapter Training for Personalized Image Generation

TL;DR

This work presents Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss, which facilitates the preservation of the base model's original knowledge when the model initiates adapters.

Abstract

The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.
Paper Structure (12 sections, 11 equations, 8 figures, 1 table)

This paper contains 12 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Comparison between Baseline LoRA and Ours. We used the following prompts elephants, a colorful teapot, and a thin bird. Baseline (left) displays knowledge shift and lacking diversity. CAT (right) does not display any knowledge shift and preserves models' ability in diverse manner.
  • Figure 2: The Basic Pipeline of CAT. The CAT loss between the frozen U-net Unet and unconditioned adapter activation is calculated while training the adapter. The Adapters such as LoRA in the attention layers of U-net is depicted as orange boxes. All the figures share the same parameters unless specified.
  • Figure 3: Results of text-to-image generation with prompts: a round bird staring to the right with its beak closed. LoRA fails to preserve original knowledge and limits its generation to target image. CAT achieves high fidelity identity generation, while retaining the original knowledge in token-less generation.
  • Figure 4: Qualitative results of various adapter training. We used ETH-80 dataseteth for training and validation. The prompts used for generation are a green sports car, a cow in a city, and a dog in a city.
  • Figure 5: Video memory and time consumption of various adapters in train and inference. The red dots show our method's performance. TCO TCO uses Stable Diffusion XL modelSDXL requiring a larger VRAM consumption. We included the preparation time for regularization set with DreamboothDreambooth and TCO in the total training time. We plan to conduct further optimizations to decrease memory consumption on the same level as LoRA LoRA.
  • ...and 3 more figures