SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

Haoyu Zheng; Qifan Yu; Binghe Yu; Yang Dai; Wenqiao Zhang; Juncheng Li; Siliang Tang; Yueting Zhuang

SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

Haoyu Zheng, Qifan Yu, Binghe Yu, Yang Dai, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

TL;DR

SOYO presents a tuning-free diffusion-based framework for open-domain video style morphing that preserves structural content while smoothly transitioning between two style references. It combines cross-frame style fusion attention, Dual-Style Latent AdaIN, and Adaptive Style Distance Mapping to interpolate style features and color statistics over time without fine-tuning a pre-trained model. The method achieves superior temporal coherence and structural preservation on the SOYO-Test benchmark, demonstrating effective handling of diverse scenes and artistic styles. The approach offers a practical, efficient solution for high-fidelity multi-style video stylization with minimal additional computational overhead beyond inversion and diffusion steps.

Abstract

Diffusion models have achieved remarkable progress in image and video stylization. However, most existing methods focus on single-style transfer, while video stylization involving multiple styles necessitates seamless transitions between them. We refer to this smooth style transition between video frames as video style morphing. Current approaches often generate stylized video frames with discontinuous structures and abrupt style changes when handling such transitions. To address these limitations, we introduce SOYO, a novel diffusion-based framework for video style morphing. Our method employs a pre-trained text-to-image diffusion model without fine-tuning, combining attention injection and AdaIN to preserve structural consistency and enable smooth style transitions across video frames. Moreover, we notice that applying linear equidistant interpolation directly induces imbalanced style morphing. To harmonize across video frames, we propose a novel adaptive sampling scheduler operating between two style images. Extensive experiments demonstrate that SOYO outperforms existing methods in open-domain video style morphing, better preserving the structural coherence of video frames while achieving stable and smooth style transitions.

SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

TL;DR

Abstract

SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)