SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song
TL;DR
This work introduces SketchINR, an implicit neural representation for vector sketches that encodes variable-length sketches into a fixed-size latent code and uses a global implicit function to map time and stroke indices to ink coordinates and pen states. By leveraging positional embeddings, a visual-intensity based loss, and a multi-sketch auto-decoder with a VAE-based generator, SketchINR achieves high-fidelity reconstruction with dramatically reduced storage (up to ~60x vs raster and ~10x vs vector) and enables parallel decoding orders of magnitude faster than autoregressive methods. It further supports sketch abstraction, interpolation, completion, and generation, including a-temporal completion, demonstrating a flexible, compact codec capable of handling complex scene sketches like FS-COCO. Overall, SketchINR offers a new implicit, controllable, and scalable paradigm for modelling and manipulating sketches, with practical implications for compression, generation, and human-like abstraction.
Abstract
We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. The learned function predicts the $xy$ point coordinates in a sketch at each time and stroke. Despite its simplicity, SketchINR outperforms existing representations at multiple tasks: (i) Encoding an entire sketch dataset into a fixed size latent vector, SketchINR gives $60\times$ and $10\times$ data compression over raster and vector sketches, respectively. (ii) SketchINR's auto-decoder provides a much higher-fidelity representation than other learned vector sketch representations, and is uniquely able to scale to complex vector sketches such as FS-COCO. (iii) SketchINR supports parallelisation that can decode/render $\sim$$100\times$ faster than other learned vector representations such as SketchRNN. (iv) SketchINR, for the first time, emulates the human ability to reproduce a sketch with varying abstraction in terms of number and complexity of strokes. As a first look at implicit sketches, SketchINR's compact high-fidelity representation will support future work in modelling long and complex sketches.
