SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

Ellie Arar; Yarden Frenkel; Daniel Cohen-Or; Ariel Shamir; Yael Vinker

SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

Ellie Arar, Yarden Frenkel, Daniel Cohen-Or, Ariel Shamir, Yael Vinker

TL;DR

SwiftSketch tackles image-to-vector sketch generation by learning a diffusion process over stroke coordinates conditioned on image features, enabling sub-second inference with as few as $T=50$ denoising steps. It introduces ControlSketch to synthesize a large paired dataset using a depth-conditioned ControlNet, supporting training of a transformer-decoder SwiftSketch that handles discrete vector data. Results show SwiftSketch generalizes to diverse concepts, achieving high fidelity and naturalistic vector sketches while dramatically reducing generation time compared to optimization-based baselines. The work enables real-time, editable vector sketch generation and provides a scalable data-generation pipeline that can support broader research.

Abstract

Recent advancements in large vision-language models have enabled highly expressive and diverse vector sketch generation. However, state-of-the-art methods rely on a time-consuming optimization process involving repeated feedback from a pretrained model to determine stroke placement. Consequently, despite producing impressive sketches, these methods are limited in practical applications. In this work, we introduce SwiftSketch, a diffusion model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second. SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution. Its transformer-decoder architecture is designed to effectively handle the discrete nature of vector representation and capture the inherent global dependencies between strokes. To train SwiftSketch, we construct a synthetic dataset of image-sketch pairs, addressing the limitations of existing sketch datasets, which are often created by non-artists and lack professional quality. For generating these synthetic sketches, we introduce ControlSketch, a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet. We demonstrate that SwiftSketch generalizes across diverse concepts, efficiently producing sketches that combine high fidelity with a natural and visually appealing style.

SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

TL;DR

Abstract

SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (41)