StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
Ruojun Xu, Weijie Xi, Xiaodi Wang, Yongbo Mao, Zach Cheng
TL;DR
StyleSSP tackles two persistent problems in training-free diffusion-based style transfer: content changes to the original image and leakage of content from the style image. It introduces sampling startpoint enhancement via Frequency Manipulation and Negative Guidance via Inversion, enabling better content preservation and decoupling of style from content, with ControlNet and IP-Instruct providing targeted control and extraction. Empirical results on MS-COCO and WikiArt against multiple baselines show improvements in ArtFID, FID, and LPIPS, along with strong qualitative results and ablations supporting the two core components. The approach delivers a practical, training-free solution that yields sharper content structures and more faithful style transfer, with potential extensions to region-aware startpoint strategies.
Abstract
Training-free diffusion-based methods have achieved remarkable success in style transfer, eliminating the need for extensive training or fine-tuning. However, due to the lack of targeted training for style information extraction and constraints on the content image layout, training-free methods often suffer from layout changes of original content and content leakage from style images. Through a series of experiments, we discovered that an effective startpoint in the sampling stage significantly enhances the style transfer process. Based on this discovery, we propose StyleSSP, which focuses on obtaining a better startpoint to address layout changes of original content and content leakage from style image. StyleSSP comprises two key components: (1) Frequency Manipulation: To improve content preservation, we reduce the low-frequency components of the DDIM latent, allowing the sampling stage to pay more attention to the layout of content images; and (2) Negative Guidance via Inversion: To mitigate the content leakage from style image, we employ negative guidance in the inversion stage to ensure that the startpoint of the sampling stage is distanced from the content of style image. Experiments show that StyleSSP surpasses previous training-free style transfer baselines, particularly in preserving original content and minimizing the content leakage from style image. Project page: https://github.com/bytedance/StyleSSP.
