On the Temporality for Sketch Representation Learning
Marcelo Isaias de Moraes Junior, Moacir Antonelli Ponti
TL;DR
This paper systematically probes the temporality of sketch representations, evaluating autoregressive versus non-autoregressive decoding, absolute versus relative coordinates, and various sketch-order permutations. Across classification, segmentation, and reconstruction on QuickDraw and SPG, it finds absolute coordinates and Stroke-5 representations generally superior, with non-autoregressive decoders yielding better reconstruction and downstream performance. The results show temporality matters but its value depends on task and encoding, and inter-stroke order has a larger impact than intra-stroke order, guiding practical design choices for sketch models and potential directions for sketch generation.
Abstract
Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced significantly, there is still a gap in understanding the true relevance of the temporal aspect to the quality of these representations. This work investigates whether it is indeed justifiable to treat sketches as sequences, as well as which internal orders play a more relevant role. The results indicate that, although the use of traditional positional encodings is valid for modeling sketches as sequences, absolute coordinates consistently outperform relative ones. Furthermore, non-autoregressive decoders outperform their autoregressive counterparts. Finally, the importance of temporality was shown to depend on both the order considered and the task evaluated.
