Generating Sketches in a Hierarchical Auto-Regressive Process for Flexible Sketch Drawing Manipulation at Stroke-Level
Sicong Zang, Shuhui Gao, Zhijun Fang
TL;DR
This work tackles controllable sketch generation at the stroke level, enabling edits during the drawing process. It introduces Sketch-HARP, a hierarchical auto-regressive framework that first predicts stroke embeddings, then anchors them on the canvas, and finally translates embeddings into drawing actions, all within an autoregressive loop. The model employs a stroke encoder, a position encoder, and a relationship encoder to produce a sketch code that guides a three-stage generator, trained with a multi-term loss to balance sequence fidelity, spatial placement, and visual quality. Experiments on QuickDraw DS1 and DS2 demonstrate flexible stroke-level manipulation, including replacement, erasion, and expansion, while maintaining competitive sketch reconstruction, underscoring the method's potential for interactive sketch editing.
Abstract
Generating sketches with specific patterns as expected, i.e., manipulating sketches in a controllable way, is a popular task. Recent studies control sketch features at stroke-level by editing values of stroke embeddings as conditions. However, in order to provide generator a global view about what a sketch is going to be drawn, all these edited conditions should be collected and fed into generator simultaneously before generation starts, i.e., no further manipulation is allowed during sketch generating process. In order to realize sketch drawing manipulation more flexibly, we propose a hierarchical auto-regressive sketch generating process. Instead of generating an entire sketch at once, each stroke in a sketch is generated in a three-staged hierarchy: 1) predicting a stroke embedding to represent which stroke is going to be drawn, and 2) anchoring the predicted stroke on the canvas, and 3) translating the embedding to a sequence of drawing actions to form the full sketch. Moreover, the stroke prediction, anchoring and translation are proceeded auto-regressively, i.e., both the recently generated strokes and their positions are considered to predict the current one, guiding model to produce an appropriate stroke at a suitable position to benefit the full sketch generation. It is flexible to manipulate stroke-level sketch drawing at any time during generation by adjusting the exposed editable stroke embeddings.
