Table of Contents
Fetching ...

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

Bolin Chen, Baoquan Zhao, Haoran Xie, Yi Cai, Qing Li, Xudong Mao

TL;DR

ConsisLoRA addresses critical weaknesses in LoRA-based diffusion style transfer—content inconsistency, style misalignment, and content leakage—by replacing $\epsilon$-prediction with $x_0$-prediction, adopting a two-step training regimen to decouple content and style, and applying a stepwise loss to capture both global structure and local details. An inference guidance mechanism enables continuous control over content and style strengths during generation. Across qualitative and quantitative evaluations, ConsisLoRA demonstrates superior content preservation and style alignment with reduced leakage compared with state-of-the-art baselines, highlighting its practical impact for single-image style transfer tasks. The approach offers a scalable, parameter-efficient pathway to more reliable content-aware stylization using diffusion models, particularly SDXL-based architectures.

Abstract

Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.

ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer

TL;DR

ConsisLoRA addresses critical weaknesses in LoRA-based diffusion style transfer—content inconsistency, style misalignment, and content leakage—by replacing -prediction with -prediction, adopting a two-step training regimen to decouple content and style, and applying a stepwise loss to capture both global structure and local details. An inference guidance mechanism enables continuous control over content and style strengths during generation. Across qualitative and quantitative evaluations, ConsisLoRA demonstrates superior content preservation and style alignment with reduced leakage compared with state-of-the-art baselines, highlighting its practical impact for single-image style transfer tasks. The approach offers a scalable, parameter-efficient pathway to more reliable content-aware stylization using diffusion models, particularly SDXL-based architectures.

Abstract

Style transfer involves transferring the style from a reference image to the content of a target image. Recent advancements in LoRA-based (Low-Rank Adaptation) methods have shown promise in effectively capturing the style of a single image. However, these approaches still face significant challenges such as content inconsistency, style misalignment, and content leakage. In this paper, we comprehensively analyze the limitations of the standard diffusion parameterization, which learns to predict noise, in the context of style transfer. To address these issues, we introduce ConsisLoRA, a LoRA-based method that enhances both content and style consistency by optimizing the LoRA weights to predict the original image rather than noise. We also propose a two-step training strategy that decouples the learning of content and style from the reference image. To effectively capture both the global structure and local details of the content image, we introduce a stepwise loss transition strategy. Additionally, we present an inference guidance method that enables continuous control over content and style strengths during inference. Through both qualitative and quantitative evaluations, our method demonstrates significant improvements in content and style consistency while effectively reducing content leakage.

Paper Structure

This paper contains 36 sections, 4 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: Style transfer results of our method. Given a content image and a style reference image, ConsisLoRA enables high-fidelity stylized generations that excel in both content preservation and style alignment.
  • Figure 2: Examples of three significant challenges encountered by existing LoRA-based methods: 1) Content inconsistency: the structure of the generated image is inconsistent with that of the content image; 2) Style misalignment: the style of the generated image does not align with that of the style image; 3) Content leakage: content from the style image undesirably leaks into the generated image.
  • Figure 3: Comparison of the average loss across various timestep intervals for different parameterizations of diffusion models.
  • Figure 4: Method Overview. We replace the standard $\epsilon$-prediction with $x_0$-prediction for training both style and content LoRAs. (Bottom-left) For training the content LoRA, we propose a loss transition strategy to capture both the global structure and local details of the content image. (Top) To disentangle the learning of style and content from the style image, we introduce a two-step training strategy: first, learn a content-consistent LoRA using the proposed loss transition, and then, train a style LoRA while keeping the content LoRA fixed.
  • Figure 5: Qualitative comparison. We present style transfer results of our method and four baseline methods, including B-LoRA B-LoRA, ZipLoRA ziplora, StyleID styleID, and StyleAligned styleAligned. Our method demonstrates superior performance in preserving the structure of the content image while accurately applying the style from the reference style image.
  • ...and 13 more figures