Table of Contents
Fetching ...

Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing

Cong Cao, Yujie Xu, Xiaodong Xu

TL;DR

The paper tackles few-shot image style editing by adapting diffusion-based editing models to unseen styles with minimal data. It introduces a parameter-efficient multi-style MoE LoRA that combines style-specific and style-shared routing, plus metric-guided dynamic rank to automatically prune LoRA components, and analyzes LoRA placement within the Flux-based DiT model. By integrating adversarial learning and flow matching, the approach achieves superior editing quality while using only a small fraction of LoRA parameters compared to baselines. A five-style benchmark demonstrates robust performance across global and local style changes, highlighting practical impact for rapid, data-efficient style editing. Overall, the work advances efficient, scalable style editing for diffusion models and provides concrete design choices with empirical validation.

Abstract

In recent years, image editing has garnered growing attention. However, general image editing models often fail to produce satisfactory results when confronted with new styles. The challenge lies in how to effectively fine-tune general image editing models to new styles using only a limited amount of paired data. To address this issue, this paper proposes a novel few-shot style editing framework. For this task, we construct a benchmark dataset that encompasses five distinct styles. Correspondingly, we propose a parameter-efficient multi-style Mixture-of-Experts Low-Rank Adaptation (MoE LoRA) with style-specific and style-shared routing mechanisms for jointly fine-tuning multiple styles. The style-specific routing ensures that different styles do not interfere with one another, while the style-shared routing adaptively allocates shared MoE LoRAs to learn common patterns. Our MoE LoRA can automatically determine the optimal ranks for each layer through a novel metric-guided approach that estimates the importance score of each single-rank component. Additionally, we explore the optimal location to insert LoRA within the Diffusion in Transformer (DiT) model and integrate adversarial learning and flow matching to guide the diffusion training process. Experimental results demonstrate that our proposed method outperforms existing state-of-the-art approaches with significantly fewer LoRA parameters.

Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing

TL;DR

The paper tackles few-shot image style editing by adapting diffusion-based editing models to unseen styles with minimal data. It introduces a parameter-efficient multi-style MoE LoRA that combines style-specific and style-shared routing, plus metric-guided dynamic rank to automatically prune LoRA components, and analyzes LoRA placement within the Flux-based DiT model. By integrating adversarial learning and flow matching, the approach achieves superior editing quality while using only a small fraction of LoRA parameters compared to baselines. A five-style benchmark demonstrates robust performance across global and local style changes, highlighting practical impact for rapid, data-efficient style editing. Overall, the work advances efficient, scalable style editing for diffusion models and provides concrete design choices with empirical validation.

Abstract

In recent years, image editing has garnered growing attention. However, general image editing models often fail to produce satisfactory results when confronted with new styles. The challenge lies in how to effectively fine-tune general image editing models to new styles using only a limited amount of paired data. To address this issue, this paper proposes a novel few-shot style editing framework. For this task, we construct a benchmark dataset that encompasses five distinct styles. Correspondingly, we propose a parameter-efficient multi-style Mixture-of-Experts Low-Rank Adaptation (MoE LoRA) with style-specific and style-shared routing mechanisms for jointly fine-tuning multiple styles. The style-specific routing ensures that different styles do not interfere with one another, while the style-shared routing adaptively allocates shared MoE LoRAs to learn common patterns. Our MoE LoRA can automatically determine the optimal ranks for each layer through a novel metric-guided approach that estimates the importance score of each single-rank component. Additionally, we explore the optimal location to insert LoRA within the Diffusion in Transformer (DiT) model and integrate adversarial learning and flow matching to guide the diffusion training process. Experimental results demonstrate that our proposed method outperforms existing state-of-the-art approaches with significantly fewer LoRA parameters.

Paper Structure

This paper contains 19 sections, 15 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Examples of five styles in our datasets.
  • Figure 2: The framework of the proposed method. We propose a parameter-efficient multi-style MoE LoRA with style-specific and style-shared routing. Our MoE LoRA can automatically determine the optimal ranks for each layer with metric-guided dynamic rank.
  • Figure 3: Compare the importance of double-stream denoising transformer block (DSTB) and single-stream denoising transformer block (SSTB).
  • Figure 4: Visual quality comparison on our dataset. Zoom in for better observation