Table of Contents
Fetching ...

SVGEditBench V2: A Benchmark for Instruction-based SVG Editing

Kunato Nishina, Yusuke Matsui

TL;DR

SVGEditBench V2 introduces a large, diverse benchmark for instruction-based SVG editing by pairing original and edited SVGs with editing prompts generated via GPT-4o. The dataset, built from public SVG emoji sources, includes 1,683 triplets and is evaluated with four metrics that cover raster similarity, semantic alignment, and geometric shape fidelity, using $MSE$, $DINOv2$, $CLIPScore$, and a two-step Chamfer distance $d_ ext{shape}$. The authors benchmark 15 LLMs and 7 LMMs, finding that current models struggle with semantically-guided vector edits, particularly in complex <path> manipulation and precise numeric control. They show that while some tasks can be solved by simple edits, path-level accuracy remains a key bottleneck, suggesting the need for improved code generation and assistive SVG-edit modules. The benchmark aims to accelerate research in text-to-vector editing and lower barriers to vector graphic processing for both professionals and novices.

Abstract

Vector format has been popular for representing icons and sketches. It has also been famous for design purposes. Regarding image editing, research on vector graphics editing rarely exists in contrast with the raster counterpart. We considered the reason to be the lack of datasets and benchmarks. Thus, we propose SVGEditBench V2, a benchmark dataset for instruction-based SVG editing. SVGEditBench V2 comprises triplets of an original image, a ground truth image, and the editing prompt. We built the dataset by first extracting image pairs from various SVG emoji datasets. Then, we had GPT-4o to create the prompt. We found that triplets gained by this simple pipeline contain varying sorts of editing tasks. Additionally, we performed the editing tasks with existing LLMs and investigated how those current methods can perform SVG editing. Although there were some successful cases, we found that there is a massive room for improvement.

SVGEditBench V2: A Benchmark for Instruction-based SVG Editing

TL;DR

SVGEditBench V2 introduces a large, diverse benchmark for instruction-based SVG editing by pairing original and edited SVGs with editing prompts generated via GPT-4o. The dataset, built from public SVG emoji sources, includes 1,683 triplets and is evaluated with four metrics that cover raster similarity, semantic alignment, and geometric shape fidelity, using , , , and a two-step Chamfer distance . The authors benchmark 15 LLMs and 7 LMMs, finding that current models struggle with semantically-guided vector edits, particularly in complex <path> manipulation and precise numeric control. They show that while some tasks can be solved by simple edits, path-level accuracy remains a key bottleneck, suggesting the need for improved code generation and assistive SVG-edit modules. The benchmark aims to accelerate research in text-to-vector editing and lower barriers to vector graphic processing for both professionals and novices.

Abstract

Vector format has been popular for representing icons and sketches. It has also been famous for design purposes. Regarding image editing, research on vector graphics editing rarely exists in contrast with the raster counterpart. We considered the reason to be the lack of datasets and benchmarks. Thus, we propose SVGEditBench V2, a benchmark dataset for instruction-based SVG editing. SVGEditBench V2 comprises triplets of an original image, a ground truth image, and the editing prompt. We built the dataset by first extracting image pairs from various SVG emoji datasets. Then, we had GPT-4o to create the prompt. We found that triplets gained by this simple pipeline contain varying sorts of editing tasks. Additionally, we performed the editing tasks with existing LLMs and investigated how those current methods can perform SVG editing. Although there were some successful cases, we found that there is a massive room for improvement.

Paper Structure

This paper contains 17 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: SVGEditBench V2 is a collection of triplets consisting of SVG graphics before and after editing and the editing prompt. We extracted the images from multiple publicly available emoji datasets and employed GPT-4o to generate the prompts. This straightforward pipeline can produce a variety of editing tasks. The tasks range from transforming elements to changing the overall style. A high-level understanding of the graphics is necessary to perform these tasks.
  • Figure 2: Overview of our method. Our benchmarking pipeline comprises two main parts: dataset creation and model evaluation. Refer to Sec. \ref{['sec:method']} and the supplementary material for more details.