MixSA: Training-free Reference-based Sketch Extraction via Mixture-of-Self-Attention
Rui Yang, Xiaojun Wu, Shengfeng He
TL;DR
This work tackles the challenge of producing high-quality sketches from color images without training, enabling arbitrary reference styles. It introduces Mixture-of-Self-Attention (MixSA), which injects brushstroke features from a reference sketch into late decoder layers of a latent diffusion model, while a Decomposing Contours and Texture module separates texture from contours. By controlling texture density and reference adherence through parameters, MixSA supports interpolation between styles and mitigates color-averaging artifacts inherent to diffusion models. Extensive experiments on multiple datasets show superior sketch fidelity, flexibility, and user-preferred outputs, validating the training-free approach and its practical utility for art and portrait sketching. The approach offers a versatile tool for artists and developers seeking style-consistent sketch extraction without costly retraining or data collection.
Abstract
Current sketch extraction methods either require extensive training or fail to capture a wide range of artistic styles, limiting their practical applicability and versatility. We introduce Mixture-of-Self-Attention (MixSA), a training-free sketch extraction method that leverages strong diffusion priors for enhanced sketch perception. At its core, MixSA employs a mixture-of-self-attention technique, which manipulates self-attention layers by substituting the keys and values with those from reference sketches. This allows for the seamless integration of brushstroke elements into initial outline images, offering precise control over texture density and enabling interpolation between styles to create novel, unseen styles. By aligning brushstroke styles with the texture and contours of colored images, particularly in late decoder layers handling local textures, MixSA addresses the common issue of color averaging by adjusting initial outlines. Evaluated with various perceptual metrics, MixSA demonstrates superior performance in sketch quality, flexibility, and applicability. This approach not only overcomes the limitations of existing methods but also empowers users to generate diverse, high-fidelity sketches that more accurately reflect a wide range of artistic expressions.
