OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song, Cheng Liu, Mike Zheng Shou
TL;DR
OmniConsistency introduces a diffusion-transformer–based, style-agnostic consistency plugin that decouples style learning from content preservation via a two-stage training regime and a rolling LoRA bank. By attaching a lightweight Consistency LoRA to the conditioning path and using efficient conditioning strategies, it achieves strong identity and detail preservation across diverse styles, including unseen LoRAs, while maintaining competitive text-image alignment. The approach is validated on a 2,600-pair, 22-style dataset and demonstrates superior style fidelity and structure preservation with modest computational overhead, offering practical plug-and-play integration for diffusion-based stylization. The work advances controllable, high-fidelity image stylization suitable for broad deployment and future research in consistency modeling with diffusion transformers.
Abstract
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.
