SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

Kunato Nishina; Yusuke Matsui

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

Kunato Nishina, Yusuke Matsui

TL;DR

SVGEditBench presents a quantitative benchmark to measure LLMs' ability to edit SVG code, using six targeted tasks and Twemoji SVG data. Edits are evaluated by rendering outputs and comparing image similarity, with additional compression-based metrics for code efficiency. GPT-4 consistently outperforms GPT-3.5 both quantitatively and qualitatively, validating the benchmark and prompting prompts. The work enables objective comparisons of SVG-editing capabilities and points to future work on semantic understanding and fine-tuning for SVG-specific editing.

Abstract

Text-to-image models have shown progress in recent years. Along with this progress, generating vector graphics from text has also advanced. SVG is a popular format for vector graphics, and SVG represents a scene with XML text. Therefore, Large Language Models can directly process SVG code. Taking this into account, we focused on editing SVG with LLMs. For quantitative evaluation of LLMs' ability to edit SVG, we propose SVGEditBench. SVGEditBench is a benchmark for assessing the LLMs' ability to edit SVG code. We also show the GPT-4 and GPT-3.5 results when evaluated on the proposed benchmark. In the experiments, GPT-4 showed superior performance to GPT-3.5 both quantitatively and qualitatively. The dataset is available at https://github.com/mti-lab/SVGEditBench.

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

TL;DR

Abstract

Paper Structure (14 sections, 5 figures, 1 table)

This paper contains 14 sections, 5 figures, 1 table.

Introduction
Related Works
Scalable Vector Graphics
Recent Studies on Vector Image Processing
Building the Benchmark
Overview of the Tasks
Selection of SVG Data
Evaluation Tasks and Metrics
Experiments
Quantitative Evaluation of GPT Models
Comparison with the Qualitative Evaluation
Conclusion
Structure of the Prompts
Example of Prompts for Each Prompt

Figures (5)

Figure 1: An example of an image represented in SVG format. Each XML element corresponds to a single shape or text block, as indicated by the blue arrows.
Figure 2: An overview of the tasks in the proposed benchmark and an example of the prompt in the Change Color task.
Figure 3: Sample images in the Twemoji dataset. The top row shows some images in the dataset, and the bottom row shows the ones removed.
Figure 4: Examples of answers for each task used in the proposed benchmark. Note that for the Compression task, the rendered result should not change from the original.
Figure 5: A qualitative evaluation of the SVG editing results generated by LLMs. The two rows show an example of the Change Color task and the Upside-Down task, respectively.

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

TL;DR

Abstract

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

Authors

TL;DR

Abstract

Table of Contents

Figures (5)