Table of Contents
Fetching ...

SketchINR: A First Look into Sketches as Implicit Neural Representations

Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song

TL;DR

This work introduces SketchINR, an implicit neural representation for vector sketches that encodes variable-length sketches into a fixed-size latent code and uses a global implicit function to map time and stroke indices to ink coordinates and pen states. By leveraging positional embeddings, a visual-intensity based loss, and a multi-sketch auto-decoder with a VAE-based generator, SketchINR achieves high-fidelity reconstruction with dramatically reduced storage (up to ~60x vs raster and ~10x vs vector) and enables parallel decoding orders of magnitude faster than autoregressive methods. It further supports sketch abstraction, interpolation, completion, and generation, including a-temporal completion, demonstrating a flexible, compact codec capable of handling complex scene sketches like FS-COCO. Overall, SketchINR offers a new implicit, controllable, and scalable paradigm for modelling and manipulating sketches, with practical implications for compression, generation, and human-like abstraction.

Abstract

We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. The learned function predicts the $xy$ point coordinates in a sketch at each time and stroke. Despite its simplicity, SketchINR outperforms existing representations at multiple tasks: (i) Encoding an entire sketch dataset into a fixed size latent vector, SketchINR gives $60\times$ and $10\times$ data compression over raster and vector sketches, respectively. (ii) SketchINR's auto-decoder provides a much higher-fidelity representation than other learned vector sketch representations, and is uniquely able to scale to complex vector sketches such as FS-COCO. (iii) SketchINR supports parallelisation that can decode/render $\sim$$100\times$ faster than other learned vector representations such as SketchRNN. (iv) SketchINR, for the first time, emulates the human ability to reproduce a sketch with varying abstraction in terms of number and complexity of strokes. As a first look at implicit sketches, SketchINR's compact high-fidelity representation will support future work in modelling long and complex sketches.

SketchINR: A First Look into Sketches as Implicit Neural Representations

TL;DR

This work introduces SketchINR, an implicit neural representation for vector sketches that encodes variable-length sketches into a fixed-size latent code and uses a global implicit function to map time and stroke indices to ink coordinates and pen states. By leveraging positional embeddings, a visual-intensity based loss, and a multi-sketch auto-decoder with a VAE-based generator, SketchINR achieves high-fidelity reconstruction with dramatically reduced storage (up to ~60x vs raster and ~10x vs vector) and enables parallel decoding orders of magnitude faster than autoregressive methods. It further supports sketch abstraction, interpolation, completion, and generation, including a-temporal completion, demonstrating a flexible, compact codec capable of handling complex scene sketches like FS-COCO. Overall, SketchINR offers a new implicit, controllable, and scalable paradigm for modelling and manipulating sketches, with practical implications for compression, generation, and human-like abstraction.

Abstract

We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. The learned function predicts the point coordinates in a sketch at each time and stroke. Despite its simplicity, SketchINR outperforms existing representations at multiple tasks: (i) Encoding an entire sketch dataset into a fixed size latent vector, SketchINR gives and data compression over raster and vector sketches, respectively. (ii) SketchINR's auto-decoder provides a much higher-fidelity representation than other learned vector sketch representations, and is uniquely able to scale to complex vector sketches such as FS-COCO. (iii) SketchINR supports parallelisation that can decode/render faster than other learned vector representations such as SketchRNN. (iv) SketchINR, for the first time, emulates the human ability to reproduce a sketch with varying abstraction in terms of number and complexity of strokes. As a first look at implicit sketches, SketchINR's compact high-fidelity representation will support future work in modelling long and complex sketches.
Paper Structure (13 sections, 7 equations, 11 figures, 1 table)

This paper contains 13 sections, 7 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: We model sketch as a function $f_{\theta}(t_{j}, s_{k})$ of $J$ timestamps and $K$ strokes. Timestamps $\{t_a, \dots,t_b\}$ where $t_j\in [0,1)$ correspond to way-points for a specific stroke $s_k$. Finally, $f_\theta$ with learned weights $\theta$ can be sampled with arbitrary number of strokes as $t_j\in\{0, \frac{1}{J}, \dots, \frac{J-1}{J}\}$ and $s_k \in \{0, \frac{1}{5}, \dots, \frac{4}{5}\}$ or $s_k \in \{0, \frac{1}{10}, \dots, \frac{9}{10}\}$, leading to increasing or decreasing abstraction for $K=5$ or $K=10$, respectively.
  • Figure 2: Effect of smoothing factor $\gamma$ on training: Reducing $\gamma$ leads to higher intensity in the surrounding region near stroke $s_{k}$. This is similar to stroke dilation of intensity map $\mathrm{I}_{k}$. A lower$\gamma$ leads to stable training (plot on right) but lacks fine-grained details. A higher$\gamma$ gives a fine-grained sketch but is harder to train.
  • Figure 3: SketchINR model diagram. (a) Embedding a single vector sketch as an implicit model $F$ mapping stroke and time $(\mathbf{s},\mathbf{t})$ to ink coordinate $\mathbf{p}$. (b) Embedding a vector sketch dataset as a shared decoder and set of latent vectors $\mathcal{V}$. (c) Training a generative model for implicit sketches by generating latent codes $\nu$ using an encoder $E$ that inputs raster sketches.
  • Figure 4: Qualitative reconstruction results for neural sketch representations. SketchINR is uniquely able to scale to sketches of Sketchy and FS-COCO complexity.
  • Figure 5: Rate Distortion: SketchINR can encode complex scene sketches from FS-COCOchowdhury2022fs in highly compact ($\mathbb{R}^{64}$) latent codes. Specifically, despite nearly identical sketch quality (low CD represents higher fidelity), SketchINR has $\sim$$60\times$ lower BPP than PNG raster sketches and $\sim$$10\times$ lower than vectors sketches.
  • ...and 6 more figures