Table of Contents
Fetching ...

On the Temporality for Sketch Representation Learning

Marcelo Isaias de Moraes Junior, Moacir Antonelli Ponti

TL;DR

This paper systematically probes the temporality of sketch representations, evaluating autoregressive versus non-autoregressive decoding, absolute versus relative coordinates, and various sketch-order permutations. Across classification, segmentation, and reconstruction on QuickDraw and SPG, it finds absolute coordinates and Stroke-5 representations generally superior, with non-autoregressive decoders yielding better reconstruction and downstream performance. The results show temporality matters but its value depends on task and encoding, and inter-stroke order has a larger impact than intra-stroke order, guiding practical design choices for sketch models and potential directions for sketch generation.

Abstract

Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a gap in understanding the true relevance of the temporal aspect to the quality of these representations. This work investigates whether it is indeed justifiable to treat sketches as sequences, as well as which internal orders play a more relevant role. The results indicate that, although the use of traditional positional encodings is valid for modeling sketches as sequences, absolute coordinates consistently outperform relative ones. Furthermore, non-autoregressive decoders outperform their autoregressive counterparts. Finally, the importance of temporality was shown to depend on both the order considered and the task evaluated.

On the Temporality for Sketch Representation Learning

TL;DR

This paper systematically probes the temporality of sketch representations, evaluating autoregressive versus non-autoregressive decoding, absolute versus relative coordinates, and various sketch-order permutations. Across classification, segmentation, and reconstruction on QuickDraw and SPG, it finds absolute coordinates and Stroke-5 representations generally superior, with non-autoregressive decoders yielding better reconstruction and downstream performance. The results show temporality matters but its value depends on task and encoding, and inter-stroke order has a larger impact than intra-stroke order, guiding practical design choices for sketch models and potential directions for sketch generation.

Abstract

Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a gap in understanding the true relevance of the temporal aspect to the quality of these representations. This work investigates whether it is indeed justifiable to treat sketches as sequences, as well as which internal orders play a more relevant role. The results indicate that, although the use of traditional positional encodings is valid for modeling sketches as sequences, absolute coordinates consistently outperform relative ones. Furthermore, non-autoregressive decoders outperform their autoregressive counterparts. Finally, the importance of temporality was shown to depend on both the order considered and the task evaluated.

Paper Structure

This paper contains 13 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Proposed sketch order permutations. The triangle's position denotes the point's absolute coordinates, and it points to the next point following the intra-stroke order.
  • Figure 2: (Left) Mean MSE and standard deviation (lower is better) of each sketch position on test set. (Right) Train and test set distributions of sketch lengths. Distributions are similarly right-skewed due to rare long sequences.