Text to Sketch Generation with Multi-Styles
Tengjie Li, Shikui Tu, Lei Xu
TL;DR
This work introduces M3S, a training-free diffusion-based framework for zero-shot sketch synthesis with explicit multi-style control. It achieves this by injecting reference style features into self-attention via a K/V fusion scheme with linear smoothing, and by employing a style-content guidance mechanism along with a joint AdaIN module to regulate style tendency. The method supports single- and multi-style generation, delivering high style fidelity and preserved content while enabling flexible interpolation between styles. Extensive experiments across six sketch datasets demonstrate strong text alignment, style consistency, and competitive human preferences, with SDXL-based variants offering especially robust performance for diverse artistic styles. The approach has practical impact for artists and designers seeking rapid, controllable sketch generation across varied stylistic regimes.
Abstract
Recent advances in vision-language models have facilitated progress in sketch generation. However, existing specialized methods primarily focus on generic synthesis and lack mechanisms for precise control over sketch styles. In this work, we propose a training-free framework based on diffusion models that enables explicit style guidance via textual prompts and referenced style sketches. Unlike previous style transfer methods that overwrite key and value matrices in self-attention, we incorporate the reference features as auxiliary information with linear smoothing and leverage a style-content guidance mechanism. This design effectively reduces content leakage from reference sketches and enhances synthesis quality, especially in cases with low structural similarity between reference and target sketches. Furthermore, we extend our framework to support controllable multi-style generation by integrating features from multiple reference sketches, coordinated via a joint AdaIN module. Extensive experiments demonstrate that our approach achieves high-quality sketch generation with accurate style alignment and improved flexibility in style control. The official implementation of M3S is available at https://github.com/CMACH508/M3S.
