Table of Contents
Fetching ...

Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing

Haoyun Qin, Jian Lin, Hanyuan Liu, Xueting Liu, Chengze Li

TL;DR

Hyperstroke introduces a high-quality, stroke-centric representation for assistive drawing by modeling each stroke as a tokenizable, bounded, 4-channel alpha image ⟨I,B⟩. A grid-based tokenization (|B|→$\tilde{B}$, |I|→$\tilde{\mathcal{I}}$) learned via a revised VQGAN captures fine-grained appearance and opacity; an encoder–decoder transformer then predicts sequences of hyperstrokes conditioned on canvas context and CLIP guidance. The approach is trained on a mix of synthetic and real timelapse data to learn implicit stroke dynamics and achieves both faithful stroke reconstruction and plausible, category-conditioned sketch generation on the Quick, Draw! dataset. This work enables iterative co-creative drawing with stroke-aware guidance, promising practical improvements for artists and interactive drawing systems. $S=\langle I,B\rangle$ and the blending operation $A\circ\mathcal{S}$ form the core primitives enabling incremental composition and temporal modeling.$

Abstract

Assistive drawing aims to facilitate the creative process by providing intelligent guidance to artists. Existing solutions often fail to effectively model intricate stroke details or adequately address the temporal aspects of drawing. We introduce hyperstroke, a novel stroke representation designed to capture precise fine stroke details, including RGB appearance and alpha-channel opacity. Using a Vector Quantization approach, hyperstroke learns compact tokenized representations of strokes from real-life drawing videos of artistic drawing. With hyperstroke, we propose to model assistive drawing via a transformer-based architecture, to enable intuitive and user-friendly drawing applications, which are experimented in our exploratory evaluation.

Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing

TL;DR

Hyperstroke introduces a high-quality, stroke-centric representation for assistive drawing by modeling each stroke as a tokenizable, bounded, 4-channel alpha image ⟨I,B⟩. A grid-based tokenization (|B|→, |I|→) learned via a revised VQGAN captures fine-grained appearance and opacity; an encoder–decoder transformer then predicts sequences of hyperstrokes conditioned on canvas context and CLIP guidance. The approach is trained on a mix of synthetic and real timelapse data to learn implicit stroke dynamics and achieves both faithful stroke reconstruction and plausible, category-conditioned sketch generation on the Quick, Draw! dataset. This work enables iterative co-creative drawing with stroke-aware guidance, promising practical improvements for artists and interactive drawing systems. and the blending operation form the core primitives enabling incremental composition and temporal modeling.$

Abstract

Assistive drawing aims to facilitate the creative process by providing intelligent guidance to artists. Existing solutions often fail to effectively model intricate stroke details or adequately address the temporal aspects of drawing. We introduce hyperstroke, a novel stroke representation designed to capture precise fine stroke details, including RGB appearance and alpha-channel opacity. Using a Vector Quantization approach, hyperstroke learns compact tokenized representations of strokes from real-life drawing videos of artistic drawing. With hyperstroke, we propose to model assistive drawing via a transformer-based architecture, to enable intuitive and user-friendly drawing applications, which are experimented in our exploratory evaluation.
Paper Structure (17 sections, 3 equations, 10 figures)

This paper contains 17 sections, 3 equations, 10 figures.

Figures (10)

  • Figure 1: Example of real-life artistic drawing. The incremental drawing on canvas $A_t$ is recorded in the form of timelapse video. The user-provided stroke $\mathcal{S}_t$ is not included in the timelapse and has to be explicitly estimated.
  • Figure 2: Overview of our framework. The right demonstrates the learning of tokenization in hyperstrokes (Section \ref{['subsec:hyperstroke']}), while the left shows our systematic design in predictive incremental drawing (Section \ref{['subsec:sequence']}).
  • Figure 3: Reconstruction of real-life incremental drawing from timelapse videos. (a) Timelapse snapshot at $t=328$; (b) Reconstructed canvas composited with hyperstrokes; (c) Inferred stroke sequences from adjacent timelapse frames.
  • Figure 4: Results on predictive incremental drawing conditioned on raster canvas and text descriptions. Odd rows show predicted compositions; even rows demonstrate decoded grounded strokes within its bounding box. The last example prompts 2 hyperstrokes in the decoder.
  • Figure 5: Data examples to train the Hyperstroke representation. The first group shows the data from synthetic dataset. From top to bottom are original illustrations, synthetic stroke images, and blended results. The supervision is conducted directly by the ground truth synthetic stroke. The second group demonstrates the data from real-life timelapse video, showing the previous frames in the frame pairs, the predicted stroke by our model (not part of the dataset), and the latter frames in the frame pairs, from the top to bottom accordingly. Here, the supervision is implicitly applied by the two frames.
  • ...and 5 more figures