Table of Contents
Fetching ...

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

Marco Terral, Haotian Zhang, Tianyang Zhang, Meng Lin, Xiaoqing Xie, Haoran Dai, Darsh Kaushik, Pai Peng, Nicklas Scharpff, David Vazquez, Joan Rodriguez

TL;DR

The WildSVG Benchmark is introduced, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions.

Abstract

We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics. Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts. A central challenge in this direction is the lack of suitable benchmarks. To address this need, we introduce the WildSVG Benchmark, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions. Together, these resources provide the first foundation for systematic benchmarking SVG extraction. We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios. Nonetheless, iterative refinement methods point to a promising path forward, and model capabilities are steadily improving

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

TL;DR

The WildSVG Benchmark is introduced, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions.

Abstract

We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics. Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts. A central challenge in this direction is the lack of suitable benchmarks. To address this need, we introduce the WildSVG Benchmark, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions. Together, these resources provide the first foundation for systematic benchmarking SVG extraction. We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios. Nonetheless, iterative refinement methods point to a promising path forward, and model capabilities are steadily improving
Paper Structure (26 sections, 2 equations, 21 figures, 5 tables)

This paper contains 26 sections, 2 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: Performance on WildSVG. Extracting the logo from a real image and converting it to SVG using a two step pipeline. Comparison of VLMs on the WildSVG natural dataset.
  • Figure 2: Creation of the WildSVG Benchmark and evaluation pipeline. The dataset combines a synthetic split, built by rendering real SVGs into generated scenes, and a natural split, created by detecting logos in real images and pairing them with SVG annotations. We evaluate SVG extraction using one step and two step multimodal methods and assess outputs through pixel similarity, semantic similarity, code quality, and editability metrics.
  • Figure 3: Examples of Synthetic WildSVG. Real SVGs extracted from SVG Stack are integrated into realistic scenarios using an image generation model.
  • Figure 4: Examples of Natural WildSVG. Real images with visible logos are associated with the SVG logos that are extracted from public logo databases.
  • Figure 5: Qualitative comparison of VLM outputs on the two step SVG extraction task using the Synthetic split. Given the cropped logo from the detector, each model produces an SVG reconstruction. Results illustrate varying levels of geometric fidelity, color consistency, and structural correctness across models.
  • ...and 16 more figures