Table of Contents
Fetching ...

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

TL;DR

This work addresses the challenge that existing sketch-conditioned diffusion models deform outputs when guided by freehand sketches and no textual prompts. It introduces an abstraction-aware framework comprising a sketch adapter that converts sketches into textual-like embeddings, adaptive time-step sampling that accounts for sketch abstraction, and discriminative FG-SBIR guidance to preserve fine-grained sketch-photo fidelity. Through training-time supervision with synthetic captions and a super-concept preservation loss, the method achieves photorealistic results that closely follow user sketches across diverse inputs, while maintaining inference without prompts. Extensive experiments on Sketchy demonstrate superior quantitative metrics and user-perceived quality, highlighting significant practical impact for democratizing sketch-driven image synthesis.

Abstract

This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

TL;DR

This work addresses the challenge that existing sketch-conditioned diffusion models deform outputs when guided by freehand sketches and no textual prompts. It introduces an abstraction-aware framework comprising a sketch adapter that converts sketches into textual-like embeddings, adaptive time-step sampling that accounts for sketch abstraction, and discriminative FG-SBIR guidance to preserve fine-grained sketch-photo fidelity. Through training-time supervision with synthetic captions and a super-concept preservation loss, the method achieves photorealistic results that closely follow user sketches across diverse inputs, while maintaining inference without prompts. Extensive experiments on Sketchy demonstrate superior quantitative metrics and user-perceived quality, highlighting significant practical impact for democratizing sketch-driven image synthesis.

Abstract

This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.
Paper Structure (14 sections, 8 equations, 19 figures, 2 tables)

This paper contains 14 sections, 8 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Images generated by T2I-Adapter mou2023t2i for different sketch-guidance factors ($\omega\in [0,1]$). Determining the optimum $\omega$ to obtain an ideal balance (green-bordered) between photorealism and sketch-fidelity requires manual intervention and is sample-specific. A high value of $\omega$ works well for less deformed sketches, while the same for an abstract sketch produces deformed outputs and vice-versa.
  • Figure 2: Passing null prompt (i.e., $\mathtt{"~"}$) in existing voynov2023sketchzhang2023addingmou2023t2i sketch-conditioned DMs significantly distorts the output quality.
  • Figure 3: Our overall training pipeline. (More in the text.)
  • Figure 4: Abstraction-aware $t$-sampling function for different $\omega$.
  • Figure 5: Qualitative comparison with SOTA sketch-to-image generation models on Sketchy sangkloy2016sketchy. For ControlNet zhang2023adding, T2I-Adapter mou2023t2i, and PITI wang2022pretraining, we use the fixed prompt $\mathtt{"a~photo~of~[CLASS]"}$, with $\mathtt{[CLASS]}$ replaced with corresponding class-labels of the input sketches.
  • ...and 14 more figures