StyleShot: A Snapshot on Any Style

Junyao Gao; Yanchen Liu; Yanan Sun; Yinhao Tang; Yanhong Zeng; Kai Chen; Cairong Zhao

StyleShot: A Snapshot on Any Style

Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

TL;DR

StyleShot addresses the challenge of generalized, test-time tuning-free style transfer by introducing a dedicated style-aware encoder that leverages multi-scale patch embeddings and Mixture-of-Experts, paired with a content-fusion encoder to decouple content from style. A style-balanced StyleGallery and a new StyleBench benchmark enable robust learning and evaluation across open-domain styles, from 3D and flat to fine-grained textures. The approach achieves state-of-the-art performance in both text-driven and image-driven stylization without test-time style tuning, demonstrated through qualitative visuals, human preferences, and CLIP-based metrics. These contributions provide a practical, scalable pathway for flexible, high-fidelity style transfer on diffusion-based image generators. The work also emphasizes the importance of balanced training data and explicit decoupling of content and style for generalization.

Abstract

In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability. We further employ a content-fusion encoder to enhance image-driven style transfer. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods. The project page is available at: https://styleshot.github.io/.

StyleShot: A Snapshot on Any Style

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 63 figures, 6 tables)

This paper contains 26 sections, 8 equations, 63 figures, 6 tables.

Introduction
Related Work
Method
Preliminary
Style-aware Encoder
Content-fusion encoder
StyleGallery & De-stylization
Experiments
Style Evaluation Benchmark
Qualitative Results
Quantitative Results
Ablation Studies
Conclusion
Style Evaluation Benchmark
Style Images
...and 11 more sections

Figures (63)

Figure 1: Visualization results of StyleShot for text and image-driven style transfer across six style reference images. Each stylized image is generated by StyleShot without test-time style-tuning, capturing numerous nuances such as colors, textures, illumination and layout.
Figure 2: Illustration of style extraction between CLIP image encoder (a) and our style-aware encoder (b).
Figure 3: The overall architecture of our proposed StyleShot.
Figure 4: Attention map from the CLIP image encoder on style reference images.
Figure 5: Illustration of the content input under different setting.
...and 58 more figures

StyleShot: A Snapshot on Any Style

TL;DR

Abstract

StyleShot: A Snapshot on Any Style

Authors

TL;DR

Abstract

Table of Contents

Figures (63)