Table of Contents
Fetching ...

SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition

Adarsh Tiwari, Sanket Biswas, Josep Lladós

TL;DR

SketchGPT addresses autoregressive modeling for continuous sketches by discretizing strokes into a finite primitive vocabulary and training a GPT‑like decoder to predict next tokens. It introduces a stroke‑to‑primitive abstraction and a decoder‑only transformer to enable generation, completion, and recognition, pre‑training on the QuickDraw corpus with a next‑token objective and fine‑tuning for downstream tasks. Empirical results show competitive generation quality, improved recognizability, and favorable human evaluation compared to SketchRNN, with ablations validating design choices; however, information loss from abstraction and sequence length remain key challenges. The work highlights practical impact for scalable sketch modeling and points to enhanced data representations and longer sequence strategies as avenues for future improvement on more complex sketch domains.

Abstract

We present SketchGPT, a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion, and an interpretation case study for sketch recognition. By mapping complex sketches into simplified sequences of abstract primitives, our approach significantly streamlines the input for autoregressive modeling. SketchGPT leverages the next token prediction objective strategy to understand sketch patterns, facilitating the creation and completion of drawings and also categorizing them accurately. This proposed sketch representation strategy aids in overcoming existing challenges of autoregressive modeling for continuous stroke data, enabling smoother model training and competitive performance. Our findings exhibit SketchGPT's capability to generate a diverse variety of drawings by adding both qualitative and quantitative comparisons with existing state-of-the-art, along with a comprehensive human evaluation study. The code and pretrained models will be released on our official GitHub.

SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition

TL;DR

SketchGPT addresses autoregressive modeling for continuous sketches by discretizing strokes into a finite primitive vocabulary and training a GPT‑like decoder to predict next tokens. It introduces a stroke‑to‑primitive abstraction and a decoder‑only transformer to enable generation, completion, and recognition, pre‑training on the QuickDraw corpus with a next‑token objective and fine‑tuning for downstream tasks. Empirical results show competitive generation quality, improved recognizability, and favorable human evaluation compared to SketchRNN, with ablations validating design choices; however, information loss from abstraction and sequence length remain key challenges. The work highlights practical impact for scalable sketch modeling and points to enhanced data representations and longer sequence strategies as avenues for future improvement on more complex sketch domains.

Abstract

We present SketchGPT, a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion, and an interpretation case study for sketch recognition. By mapping complex sketches into simplified sequences of abstract primitives, our approach significantly streamlines the input for autoregressive modeling. SketchGPT leverages the next token prediction objective strategy to understand sketch patterns, facilitating the creation and completion of drawings and also categorizing them accurately. This proposed sketch representation strategy aids in overcoming existing challenges of autoregressive modeling for continuous stroke data, enabling smoother model training and competitive performance. Our findings exhibit SketchGPT's capability to generate a diverse variety of drawings by adding both qualitative and quantitative comparisons with existing state-of-the-art, along with a comprehensive human evaluation study. The code and pretrained models will be released on our official GitHub.
Paper Structure (15 sections, 8 equations, 7 figures, 3 tables)

This paper contains 15 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of SketchGPT. Given an input sketch (complete or incomplete), the model predicts the next strokes for incomplete sketches or classifies the complete ones, effectively adapting to different tasks.
  • Figure 2: Illustration of the stroke-to-primitive mapping and tokenization process. We observe how raw stroke data is converted into a more structured representation, involving the interpretation of stroke primitives and then further tokenizing to be fed to the SketchGPT framework.
  • Figure 3: Adaptability of SketchGPT. We illustrate how the sketch completion pre-training phase could later be adapted towards its application for sketch recognition downstream task.
  • Figure 4: Human User Study results conducted based on the five properties shown in the legend.
  • Figure 5: Qualitative Analysis for Sketch Generation. We illustrate the prediction of different possible completion of multiple incomplete sketches.
  • ...and 2 more figures