SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities
Kunato Nishina, Yusuke Matsui
TL;DR
SVGEditBench presents a quantitative benchmark to measure LLMs' ability to edit SVG code, using six targeted tasks and Twemoji SVG data. Edits are evaluated by rendering outputs and comparing image similarity, with additional compression-based metrics for code efficiency. GPT-4 consistently outperforms GPT-3.5 both quantitatively and qualitatively, validating the benchmark and prompting prompts. The work enables objective comparisons of SVG-editing capabilities and points to future work on semantic understanding and fine-tuning for SVG-specific editing.
Abstract
Text-to-image models have shown progress in recent years. Along with this progress, generating vector graphics from text has also advanced. SVG is a popular format for vector graphics, and SVG represents a scene with XML text. Therefore, Large Language Models can directly process SVG code. Taking this into account, we focused on editing SVG with LLMs. For quantitative evaluation of LLMs' ability to edit SVG, we propose SVGEditBench. SVGEditBench is a benchmark for assessing the LLMs' ability to edit SVG code. We also show the GPT-4 and GPT-3.5 results when evaluated on the proposed benchmark. In the experiments, GPT-4 showed superior performance to GPT-3.5 both quantitatively and qualitatively. The dataset is available at https://github.com/mti-lab/SVGEditBench.
