Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example
Kwan Yun, Youngseo Kim, Kwanggyoon Seo, Chang Wook Seo, Junyong Noh
TL;DR
DiffSketch tackles the challenge of personalized sketch extraction from images with minimal data by selecting representative denoising diffusion features from a pretrained diffusion model and fusing them with VAE features to train a sketch generator from a single example. It introduces a two level diffusion feature aggregation network, a dedicated training objective with directional CLIP losses, and a novel condition diffusion sampling strategy to promote diversity while maintaining CLIP guidance. The method is followed by distilling the trained model into a fast image to sketch translator DiffSketch_distilled, enabling efficient inference. Empirical results on BSDS500 and COCO show superior performance to strong baselines and perceptual studies confirm user preference for the approach, highlighting its potential for personalized sketching and related downstream tasks. The work also discusses limitations and suggests extending the idea of representative diffusion feature extraction to other diffusion based tasks such as segmentation and visual correspondence.
Abstract
We introduce DiffSketch, a method for generating a variety of stylized sketches from images. Our approach focuses on selecting representative features from the rich semantics of deep features within a pretrained diffusion model. This novel sketch generation method can be trained with one manual drawing. Furthermore, efficient sketch extraction is ensured by distilling a trained generator into a streamlined extractor. We select denoising diffusion features through analysis and integrate these selected features with VAE features to produce sketches. Additionally, we propose a sampling scheme for training models using a conditional generative approach. Through a series of comparisons, we verify that distilled DiffSketch not only outperforms existing state-of-the-art sketch extraction methods but also surpasses diffusion-based stylization methods in the task of extracting sketches.
