Table of Contents
Fetching ...

OmniStyle2: Learning to Stylize by Learning to Destylize

Ye Wang, Zili Yi, Yibo Zhang, Peng Zheng, Xuping Xie, Jiang Lin, Yijun Li, Yilin Wang, Rui Ma

Abstract

This paper introduces a scalable paradigm for supervised style transfer by inverting the problem: instead of learning to stylize directly, we learn to destylize, reducing stylistic elements from artistic images to recover their natural counterparts and thereby producing authentic, pixel-aligned training pairs at scale. To realize this paradigm, we propose DeStylePipe, a progressive, multi-stage destylization framework that begins with global general destylization, advances to category-wise instruction adaptation, and ultimately deploys specialized model adaptation for complex styles that prompt engineering alone cannot handle. Tightly integrated into this pipeline, DestyleCoT-Filter employs Chain-of-Thought reasoning to assess content preservation and style removal at each stage, routing challenging samples forward while discarding persistently low-quality pairs. Built on this framework, we construct DeStyle-350K, a large-scale dataset aligning diverse artistic styles with their underlying content. We further introduce BCS-Bench, a benchmark featuring balanced content generality and style diversity for systematic evaluation. Extensive experiments demonstrate that models trained on DeStyle-350K achieve superior stylization quality, validating destylization as a reliable and scalable supervision paradigm for style transfer.

OmniStyle2: Learning to Stylize by Learning to Destylize

Abstract

This paper introduces a scalable paradigm for supervised style transfer by inverting the problem: instead of learning to stylize directly, we learn to destylize, reducing stylistic elements from artistic images to recover their natural counterparts and thereby producing authentic, pixel-aligned training pairs at scale. To realize this paradigm, we propose DeStylePipe, a progressive, multi-stage destylization framework that begins with global general destylization, advances to category-wise instruction adaptation, and ultimately deploys specialized model adaptation for complex styles that prompt engineering alone cannot handle. Tightly integrated into this pipeline, DestyleCoT-Filter employs Chain-of-Thought reasoning to assess content preservation and style removal at each stage, routing challenging samples forward while discarding persistently low-quality pairs. Built on this framework, we construct DeStyle-350K, a large-scale dataset aligning diverse artistic styles with their underlying content. We further introduce BCS-Bench, a benchmark featuring balanced content generality and style diversity for systematic evaluation. Extensive experiments demonstrate that models trained on DeStyle-350K achieve superior stylization quality, validating destylization as a reliable and scalable supervision paradigm for style transfer.

Paper Structure

This paper contains 13 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Diverse stylization results of our method. Our framework can generates high-fidelity stylization results across a wide range of artistic styles at 1K resolution.
  • Figure 2: Stylization-based vs. destylization-based data generation pipelines.
  • Figure 3: Overview of DeStylePipe framework. Starting from multi-source artistic image collection, a three-stage progressive destylization pipeline is applied, with DestyleCoT-Filter after each stage to assess quality and route failed cases forward.
  • Figure 4: Overview of DestyleCoT-Filter. A CoT-driven multi-dimensional filtering framework scoring across content preservation and style removal to ensure data quality.
  • Figure 5: Representative samples of DeStyle-350K. Our dataset spans six major style categories with over 500 fine-grained subcategories. Each triplet shows (left to right) the destylized image, reference image, and style image.
  • ...and 6 more figures