Table of Contents
Fetching ...

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai

TL;DR

This work addresses the challenge of fast, high-fidelity image vectorization into SVG. It introduces SuperSVG, which partitions images into superpixels and uses a coarse-stage to capture main structure followed by a refinement-stage that adds details, with a Dynamic Path Warping loss to distill knowledge between stages. The approach yields state-of-the-art reconstruction accuracy and substantially faster inference times than prior methods, demonstrated on ImageNet-scale data and auxiliary Emoji comparisons. The key contributions include a coarse-path guided refinement strategy and a differentiable DPW loss that enables effective end-to-end training and efficient vectorization.

Abstract

SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To address this issue, we propose SuperSVG, a superpixel-based vectorization model that achieves fast and high-precision image vectorization. Specifically, we decompose the input image into superpixels to help the model focus on areas with similar colors and textures. Then, we propose a two-stage self-training framework, where a coarse-stage model is employed to reconstruct the main structure and a refinement-stage model is used for enriching the details. Moreover, we propose a novel dynamic path warping loss to help the refinement-stage model to inherit knowledge from the coarse-stage model. Extensive qualitative and quantitative experiments demonstrate the superior performance of our method in terms of reconstruction accuracy and inference time compared to state-of-the-art approaches. The code is available in \url{https://github.com/sjtuplayer/SuperSVG}.

SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

TL;DR

This work addresses the challenge of fast, high-fidelity image vectorization into SVG. It introduces SuperSVG, which partitions images into superpixels and uses a coarse-stage to capture main structure followed by a refinement-stage that adds details, with a Dynamic Path Warping loss to distill knowledge between stages. The approach yields state-of-the-art reconstruction accuracy and substantially faster inference times than prior methods, demonstrated on ImageNet-scale data and auxiliary Emoji comparisons. The key contributions include a coarse-path guided refinement strategy and a differentiable DPW loss that enables effective end-to-end training and efficient vectorization.

Abstract

SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To address this issue, we propose SuperSVG, a superpixel-based vectorization model that achieves fast and high-precision image vectorization. Specifically, we decompose the input image into superpixels to help the model focus on areas with similar colors and textures. Then, we propose a two-stage self-training framework, where a coarse-stage model is employed to reconstruct the main structure and a refinement-stage model is used for enriching the details. Moreover, we propose a novel dynamic path warping loss to help the refinement-stage model to inherit knowledge from the coarse-stage model. Extensive qualitative and quantitative experiments demonstrate the superior performance of our method in terms of reconstruction accuracy and inference time compared to state-of-the-art approaches. The code is available in \url{https://github.com/sjtuplayer/SuperSVG}.
Paper Structure (20 sections, 11 equations, 20 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 11 equations, 20 figures, 4 tables, 1 algorithm.

Figures (20)

  • Figure 1: Overview of our SuperSVG: our model first decomposes the image to be vectorized into superpixels, each containing pixels sharing similar colors and contents. The coarse-stage model predicts the path parameters to reconstruct the main structure, and then the coarse paths guidedrefinement model enriches the details by learning the knowledge from the coarse-stage model. Compared to the previous methods, our SuperSVG achieves both a high vectorization accuracy and fast computation speed.
  • Figure 2: Main framework of our SuperSVG: we decompose the target image into superpixels and vectorize each superpixel separately. We employ an attention-based coarse-stage model to predict SVG paths that reconstruct the main structure of the superpixel. Then, a refinement-stage model guided by the coarse paths is designed to predict more SVG paths to refine details based on the coarse image. Finally, by combining all the predicted SVGs for each superpixel, we obtain an output SVG with good structure and fine details.
  • Figure 3: Illustration of our boundary loss $\mathcal{L}_{Bound}$, which computes the area of the SVG paths that are outside the superpixel mask, and guides the paths to be inside the superpixel.
  • Figure 4: Problem of training the refinement model with $\mathcal{L}_2$ loss alone: optimizing a newly-added path on the canvas by $\mathcal{L}_2$ gradually pulls it to disappear (as a suboptimal local minimum). With our proposed coarse paths guided training and DPW loss, the added path is successfully optimized to resemble the target.
  • Figure 5: Difference between DTW and our DPW. a) Both the DTW and DPW loss calculate the sum of distances of elements colored yellow. The difference is that one generated path can only match one target path in DPW to avoid averaging several target paths. b) The comparison between the training processes.
  • ...and 15 more figures