Table of Contents
Fetching ...

DeepIcon: A Hierarchical Network for Layer-wise Icon Vectorization

Qi Bing, Chaoyi Zhang, Weidong Cai

TL;DR

The experimental results indicate that DeepIcon can efficiently produce Scalable Vector Graphics (SVGs) directly from raster images, bypassing the need for a differentiable rasterizer while also demonstrating a profound understanding of the image contents.

Abstract

In contrast to the well-established technique of rasterization, vectorization of images poses a significant challenge in the field of computer graphics. Recent learning-based methods for converting raster images to vector formats frequently suffer from incomplete shapes, redundant path prediction, and a lack of accuracy in preserving the semantics of the original content. These shortcomings severely hinder the utility of these methods for further editing and manipulation of images. To address these challenges, we present DeepIcon, a novel hierarchical image vectorization network specifically tailored for generating variable-length icon vector graphics based on the raster image input. Our experimental results indicate that DeepIcon can efficiently produce Scalable Vector Graphics (SVGs) directly from raster images, bypassing the need for a differentiable rasterizer while also demonstrating a profound understanding of the image contents.

DeepIcon: A Hierarchical Network for Layer-wise Icon Vectorization

TL;DR

The experimental results indicate that DeepIcon can efficiently produce Scalable Vector Graphics (SVGs) directly from raster images, bypassing the need for a differentiable rasterizer while also demonstrating a profound understanding of the image contents.

Abstract

In contrast to the well-established technique of rasterization, vectorization of images poses a significant challenge in the field of computer graphics. Recent learning-based methods for converting raster images to vector formats frequently suffer from incomplete shapes, redundant path prediction, and a lack of accuracy in preserving the semantics of the original content. These shortcomings severely hinder the utility of these methods for further editing and manipulation of images. To address these challenges, we present DeepIcon, a novel hierarchical image vectorization network specifically tailored for generating variable-length icon vector graphics based on the raster image input. Our experimental results indicate that DeepIcon can efficiently produce Scalable Vector Graphics (SVGs) directly from raster images, bypassing the need for a differentiable rasterizer while also demonstrating a profound understanding of the image contents.

Paper Structure

This paper contains 24 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Three examples of frequently encountered issues in image vectorization. The colors in the images are used to distinguish individual SVG paths. (a): An example of redundant path predictions. Despite the redundancy, the rendered output remains quantitatively accurate, indicating that redundant paths may not be reflected through the evaluation metrics. (b): In this case, the prediction performance results in low quantitative accuracy due to the geometric offset of the predicted paths. However, this example demonstrates the model's capability to grasp the underlying semantics and relationships between shapes, such as recognizing two rectangles and one line. (c): Compared with (b), predicting shapes incompletely but with accurate positioning may achieve higher pixel accuracy.
  • Figure 2: The fundamental workflow that DeepIcon performs image vectorization, converting images into SVG format. Initially, the image undergoes encoding to transform into a single embedding. This embedding is then fed into a decoder to generate a sequence of parametric shapes.
  • Figure 3: The overall architecture of DeepIcon. The input image is encoded with a CLIP Image Encoder Clip to generate the latent embedding $z_I$. Then, it will be fed into a Structure Decoder to infer a series of path embeddings $z_P$ and corresponding path visibility attributes $v_P$. For each path embedding, an individual path decoder outputs a pair of sequences $(\{T_{j,k}\}, \{A_{j,k}\})$ that defines the attributes and continuous arguments for each inferred SVG path. Here we use $T_{j,k}$ and $A_{j,k}$ to indicate the $k^{th}$ tokens from the $j^{th}$ command sequences within path $P_i$.
  • Figure 4: Our qualitative comparison with SOTA methods DeepSVG and LIVE.
  • Figure 5: Ablation study on our proposed image vectorization network. The first column showcases the target image, which also serves as the input for our models. The specific configurations for models A through E are detailed in Table \ref{['tab:ablat']}.