WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

Marco Terral; Haotian Zhang; Tianyang Zhang; Meng Lin; Xiaoqing Xie; Haoran Dai; Darsh Kaushik; Pai Peng; Nicklas Scharpff; David Vazquez; Joan Rodriguez

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

Marco Terral, Haotian Zhang, Tianyang Zhang, Meng Lin, Xiaoqing Xie, Haoran Dai, Darsh Kaushik, Pai Peng, Nicklas Scharpff, David Vazquez, Joan Rodriguez

TL;DR

The WildSVG Benchmark is introduced, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions.

Abstract

We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics. Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts. A central challenge in this direction is the lack of suitable benchmarks. To address this need, we introduce the WildSVG Benchmark, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions. Together, these resources provide the first foundation for systematic benchmarking SVG extraction. We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios. Nonetheless, iterative refinement methods point to a promising path forward, and model capabilities are steadily improving

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

TL;DR

Abstract

Paper Structure (26 sections, 2 equations, 21 figures, 5 tables)

This paper contains 26 sections, 2 equations, 21 figures, 5 tables.

Introduction
Related work
Logo Detection and Extraction
SVG Generation
Dataset Survey
Motivation for SVG extraction
WildSVG Datasets
Natural WildSVG
Synthetic WildSVG
Quality filtering and resulting dataset
Licensing
WildSVG benchmark
Evaluation Metrics
Baselines
Evaluation Setting
...and 11 more sections

Figures (21)

Figure 1: Performance on WildSVG. Extracting the logo from a real image and converting it to SVG using a two step pipeline. Comparison of VLMs on the WildSVG natural dataset.
Figure 2: Creation of the WildSVG Benchmark and evaluation pipeline. The dataset combines a synthetic split, built by rendering real SVGs into generated scenes, and a natural split, created by detecting logos in real images and pairing them with SVG annotations. We evaluate SVG extraction using one step and two step multimodal methods and assess outputs through pixel similarity, semantic similarity, code quality, and editability metrics.
Figure 3: Examples of Synthetic WildSVG. Real SVGs extracted from SVG Stack are integrated into realistic scenarios using an image generation model.
Figure 4: Examples of Natural WildSVG. Real images with visible logos are associated with the SVG logos that are extracted from public logo databases.
Figure 5: Qualitative comparison of VLM outputs on the two step SVG extraction task using the Synthetic split. Given the cropped logo from the detector, each model produces an SVG reconstruction. Results illustrate varying levels of geometric fidelity, color consistency, and structural correctness across models.
...and 16 more figures

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

TL;DR

Abstract

WildSVG: Towards Reliable SVG Generation Under Real-Word Conditions

Authors

TL;DR

Abstract

Table of Contents

Figures (21)