StyleShot: A Snapshot on Any Style
Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao
TL;DR
StyleShot addresses the challenge of generalized, test-time tuning-free style transfer by introducing a dedicated style-aware encoder that leverages multi-scale patch embeddings and Mixture-of-Experts, paired with a content-fusion encoder to decouple content from style. A style-balanced StyleGallery and a new StyleBench benchmark enable robust learning and evaluation across open-domain styles, from 3D and flat to fine-grained textures. The approach achieves state-of-the-art performance in both text-driven and image-driven stylization without test-time style tuning, demonstrated through qualitative visuals, human preferences, and CLIP-based metrics. These contributions provide a practical, scalable pathway for flexible, high-fidelity style transfer on diffusion-based image generators. The work also emphasizes the importance of balanced training data and explicit decoupling of content and style for generalization.
Abstract
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability. We further employ a content-fusion encoder to enhance image-driven style transfer. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods. The project page is available at: https://styleshot.github.io/.
