Table of Contents
Fetching ...

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

Aniket Roy, Shubhankar Borse, Shreya Kadambi, Debasmit Das, Shweta Mahajan, Risheek Garrepalli, Hyojin Park, Ankita Nayak, Rama Chellappa, Munawar Hayat, Fatih Porikli

TL;DR

DuoLoRA addresses joint content-style personalization in diffusion models by introducing ZipRank, layer-prior merging, and Constyle cycle-consistency to enable adaptive rank-based merging with far fewer trainable parameters. It provides theoretical guarantees, including $E_{rank} \le E_{out}$ under equal parameter budgets and a nuclear-norm relaxation with a Lagrangian penalty to enforce rank constraints. Empirically, DuoLoRA outperforms state-of-the-art baselines across multiple benchmarks and is validated via user studies, demonstrating practical efficiency and quality in content-style blending. The approach scales to multi-concept stylization and supports recontextualization, with significant improvements in both objective metrics (DINO, CLIP-I, CLIP-T, CSD-s) and human judgments.

Abstract

We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters. Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks.

DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization

TL;DR

DuoLoRA addresses joint content-style personalization in diffusion models by introducing ZipRank, layer-prior merging, and Constyle cycle-consistency to enable adaptive rank-based merging with far fewer trainable parameters. It provides theoretical guarantees, including under equal parameter budgets and a nuclear-norm relaxation with a Lagrangian penalty to enforce rank constraints. Empirically, DuoLoRA outperforms state-of-the-art baselines across multiple benchmarks and is validated via user studies, demonstrating practical efficiency and quality in content-style blending. The approach scales to multi-concept stylization and supports recontextualization, with significant improvements in both objective metrics (DINO, CLIP-I, CLIP-T, CSD-s) and human judgments.

Abstract

We tackle the challenge of jointly personalizing content and style from a few examples. A promising approach is to train separate Low-Rank Adapters (LoRA) and merge them effectively, preserving both content and style. Existing methods, such as ZipLoRA, treat content and style as independent entities, merging them by learning masks in LoRA's output dimensions. However, content and style are intertwined, not independent. To address this, we propose DuoLoRA, a content-style personalization framework featuring three key components: (i) rank-dimension mask learning, (ii) effective merging via layer priors, and (iii) Constyle loss, which leverages cycle-consistency in the merging process. First, we introduce ZipRank, which performs content-style merging within the rank dimension, offering adaptive rank flexibility and significantly reducing the number of learnable parameters. Additionally, we incorporate SDXL layer priors to apply implicit rank constraints informed by each layer's content-style bias and adaptive merger initialization, enhancing the integration of content and style. To further refine the merging process, we introduce Constyle loss, which leverages the cycle-consistency between content and style. Our experimental results demonstrate that DuoLoRA outperforms state-of-the-art content-style merging methods across multiple benchmarks.

Paper Structure

This paper contains 22 sections, 4 theorems, 45 equations, 30 figures, 15 tables, 3 algorithms.

Key Result

Theorem 1

In LoRA merging, under the same parameter budget, the approximation error resulting from rank dimension masking is less than or equal to that from output dimension masking. Formally, $E_{\text{rank}} \leq E_{\text{out}},$ where the approximation error using rank dimension masking ($E_{\text{rank}}$)

Figures (30)

  • Figure 1: Content and style personalization using DuoLoRA provides (1) adaptive rank flexibility, (2) significantly lesser trainable parameters, and (3) better content-style merging.
  • Figure 2: Overview of DuoLoRA. It consists of three components - (1) ZipRank: learning the mask in rank dimension, (2) layer-prior based merging identifying content-dominant and style-dominant blocks of SDXL UNet, (3) cycle-consistency based merging using Constyle loss.
  • Figure 3: SDXL selective weight scaling. The scaling parameter ($\alpha$) has been applied to the content-dominant blocks (up_block.2, down_block.2, and mid_block). We observe that with a smaller $\alpha$, the model fails to generate the content and instead focuses solely on the style. In contrast, increasing $\alpha$ allows the model to generate the content specified in the prompt.
  • Figure 4: Histogram of rank across low resolution layers in style merger.
  • Figure 5: Histogram of rank across high resolution layers in content merger.
  • ...and 25 more figures

Theorems & Definitions (10)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Lemma 2
  • proof