SVGEditBench V2: A Benchmark for Instruction-based SVG Editing
Kunato Nishina, Yusuke Matsui
TL;DR
SVGEditBench V2 introduces a large, diverse benchmark for instruction-based SVG editing by pairing original and edited SVGs with editing prompts generated via GPT-4o. The dataset, built from public SVG emoji sources, includes 1,683 triplets and is evaluated with four metrics that cover raster similarity, semantic alignment, and geometric shape fidelity, using $MSE$, $DINOv2$, $CLIPScore$, and a two-step Chamfer distance $d_ ext{shape}$. The authors benchmark 15 LLMs and 7 LMMs, finding that current models struggle with semantically-guided vector edits, particularly in complex <path> manipulation and precise numeric control. They show that while some tasks can be solved by simple edits, path-level accuracy remains a key bottleneck, suggesting the need for improved code generation and assistive SVG-edit modules. The benchmark aims to accelerate research in text-to-vector editing and lower barriers to vector graphic processing for both professionals and novices.
Abstract
Vector format has been popular for representing icons and sketches. It has also been famous for design purposes. Regarding image editing, research on vector graphics editing rarely exists in contrast with the raster counterpart. We considered the reason to be the lack of datasets and benchmarks. Thus, we propose SVGEditBench V2, a benchmark dataset for instruction-based SVG editing. SVGEditBench V2 comprises triplets of an original image, a ground truth image, and the editing prompt. We built the dataset by first extracting image pairs from various SVG emoji datasets. Then, we had GPT-4o to create the prompt. We found that triplets gained by this simple pipeline contain varying sorts of editing tasks. Additionally, we performed the editing tasks with existing LLMs and investigated how those current methods can perform SVG editing. Although there were some successful cases, we found that there is a massive room for improvement.
