Table of Contents
Fetching ...

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou

TL;DR

OmniConsistency introduces a diffusion-transformer–based, style-agnostic consistency plugin that decouples style learning from content preservation via a two-stage training regime and a rolling LoRA bank. By attaching a lightweight Consistency LoRA to the conditioning path and using efficient conditioning strategies, it achieves strong identity and detail preservation across diverse styles, including unseen LoRAs, while maintaining competitive text-image alignment. The approach is validated on a 2,600-pair, 22-style dataset and demonstrates superior style fidelity and structure preservation with modest computational overhead, offering practical plug-and-play integration for diffusion-based stylization. The work advances controllable, high-fidelity image stylization suitable for broad deployment and future research in consistency modeling with diffusion transformers.

Abstract

Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

TL;DR

OmniConsistency introduces a diffusion-transformer–based, style-agnostic consistency plugin that decouples style learning from content preservation via a two-stage training regime and a rolling LoRA bank. By attaching a lightweight Consistency LoRA to the conditioning path and using efficient conditioning strategies, it achieves strong identity and detail preservation across diverse styles, including unseen LoRAs, while maintaining competitive text-image alignment. The approach is validated on a 2,600-pair, 22-style dataset and demonstrates superior style fidelity and structure preservation with modest computational overhead, offering practical plug-and-play integration for diffusion-based stylization. The work advances controllable, high-fidelity image stylization suitable for broad deployment and future research in consistency modeling with diffusion transformers.

Abstract

Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.

Paper Structure

This paper contains 29 sections, 6 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Our method achieves style-consistent and structure-preserving image stylization under diverse scenes and unseen style LoRAs, outperforming existing baselines without style degradation.
  • Figure 2: Illustration of OmniConsistency, consisting of style learning and consistency learning phases. (a) In the style learning phase, individual LoRA modules are trained on dedicated datasets to capture unique stylistic details. (b) The subsequent consistency learning phase optimizes consistency LoRA for structural and detail coherence across diverse stylizations, integrating pre-trained style LoRA dynamically.
  • Figure 3: OmniConsistency can be combined with both seen and unseen style LoRA modules to achieve high-quality image stylization consistency, effectively preserving the semantics, structure, and fine details of the original image.
  • Figure 4: Comparation results of OmniConsistency and baseline methods.
  • Figure 5: Ablation shows that full settings ensure strong stylization and consistency, while removals degrade performance.
  • ...and 6 more figures