Table of Contents
Fetching ...

Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example

Kwan Yun, Youngseo Kim, Kwanggyoon Seo, Chang Wook Seo, Junyong Noh

TL;DR

DiffSketch tackles the challenge of personalized sketch extraction from images with minimal data by selecting representative denoising diffusion features from a pretrained diffusion model and fusing them with VAE features to train a sketch generator from a single example. It introduces a two level diffusion feature aggregation network, a dedicated training objective with directional CLIP losses, and a novel condition diffusion sampling strategy to promote diversity while maintaining CLIP guidance. The method is followed by distilling the trained model into a fast image to sketch translator DiffSketch_distilled, enabling efficient inference. Empirical results on BSDS500 and COCO show superior performance to strong baselines and perceptual studies confirm user preference for the approach, highlighting its potential for personalized sketching and related downstream tasks. The work also discusses limitations and suggests extending the idea of representative diffusion feature extraction to other diffusion based tasks such as segmentation and visual correspondence.

Abstract

We introduce DiffSketch, a method for generating a variety of stylized sketches from images. Our approach focuses on selecting representative features from the rich semantics of deep features within a pretrained diffusion model. This novel sketch generation method can be trained with one manual drawing. Furthermore, efficient sketch extraction is ensured by distilling a trained generator into a streamlined extractor. We select denoising diffusion features through analysis and integrate these selected features with VAE features to produce sketches. Additionally, we propose a sampling scheme for training models using a conditional generative approach. Through a series of comparisons, we verify that distilled DiffSketch not only outperforms existing state-of-the-art sketch extraction methods but also surpasses diffusion-based stylization methods in the task of extracting sketches.

Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example

TL;DR

DiffSketch tackles the challenge of personalized sketch extraction from images with minimal data by selecting representative denoising diffusion features from a pretrained diffusion model and fusing them with VAE features to train a sketch generator from a single example. It introduces a two level diffusion feature aggregation network, a dedicated training objective with directional CLIP losses, and a novel condition diffusion sampling strategy to promote diversity while maintaining CLIP guidance. The method is followed by distilling the trained model into a fast image to sketch translator DiffSketch_distilled, enabling efficient inference. Empirical results on BSDS500 and COCO show superior performance to strong baselines and perceptual studies confirm user preference for the approach, highlighting its potential for personalized sketching and related downstream tasks. The work also discusses limitations and suggests extending the idea of representative diffusion feature extraction to other diffusion based tasks such as segmentation and visual correspondence.

Abstract

We introduce DiffSketch, a method for generating a variety of stylized sketches from images. Our approach focuses on selecting representative features from the rich semantics of deep features within a pretrained diffusion model. This novel sketch generation method can be trained with one manual drawing. Furthermore, efficient sketch extraction is ensured by distilling a trained generator into a streamlined extractor. We select denoising diffusion features through analysis and integrate these selected features with VAE features to produce sketches. Additionally, we propose a sampling scheme for training models using a conditional generative approach. Through a series of comparisons, we verify that distilled DiffSketch not only outperforms existing state-of-the-art sketch extraction methods but also surpasses diffusion-based stylization methods in the task of extracting sketches.
Paper Structure (25 sections, 8 equations, 18 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 8 equations, 18 figures, 6 tables, 1 algorithm.

Figures (18)

  • Figure 1: Results of DiffSketch and distilled $\text{DiffSketch}_{distilled}$, trained with one example. The left sketches were generated by DiffSketch, while the right sketches were extracted from images using $\text{DiffSketch}_{distilled}$.
  • Figure 2: Analysis on sampled features. PCA is applied to DDIM sampled features from different classes. (a) : features colored with human-labeled classes. (b) : features colored with denoising timesteps.
  • Figure 3: Visualization of features from UNet and VAE in lower and higher resolution layers. Lower resolution layers are the first layers while higher resolution layers are the 11th for UNet and the 9th for VAE.
  • Figure 4: Overview of Diffsketch. The UNet features generated during the denoising process are fed to the Aggregation networks to be fused with the VAE features to generate a sketch corresponding to the image that Stable Diffusion generates.
  • Figure 5: Visual examples of the ablation study. Ours generates higher quality results with details such as face, separated with hair region, compared to the alternatives.
  • ...and 13 more figures