It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
TL;DR
This work addresses the challenge that existing sketch-conditioned diffusion models deform outputs when guided by freehand sketches and no textual prompts. It introduces an abstraction-aware framework comprising a sketch adapter that converts sketches into textual-like embeddings, adaptive time-step sampling that accounts for sketch abstraction, and discriminative FG-SBIR guidance to preserve fine-grained sketch-photo fidelity. Through training-time supervision with synthetic captions and a super-concept preservation loss, the method achieves photorealistic results that closely follow user sketches across diverse inputs, while maintaining inference without prompts. Extensive experiments on Sketchy demonstrate superior quantitative metrics and user-perceived quality, highlighting significant practical impact for democratizing sketch-driven image synthesis.
Abstract
This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.
