Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review
Boris Malashenko, Ivan Jarsky, Valeria Efimova
TL;DR
This paper surveys how large language models can be applied to the processing of Scalable Vector Graphics (SVG), focusing on generation, editing, and understanding. It catalogs specialized SVG-focused models (IconShop, StrokeNUWA, StarVector, SVG4LLM) and general-purpose LLMs, and evaluates them across multiple benchmarks (SVGEditBench, VGBench, SGP-Bench, SVG Taxonomy, Image-text bridging). The authors provide a cross-model, cross-benchmark analysis showing that reasoning-enhanced LLMs generally outperform non-reasoning models in SVG tasks, while data quality and benchmark richness remain major bottlenecks. They conclude with a call for richer, more diverse datasets and tokenization strategies, along with future directions toward multi-modal and more robust SVG-friendly LLM architectures to broaden practical impact in designer and developer workflows.
Abstract
In recent years, rapid advances in computer vision have significantly improved the processing and generation of raster images. However, vector graphics, which is essential in digital design, due to its scalability and ease of editing, have been relatively understudied. Traditional vectorization techniques, which are often used in vector generation, suffer from long processing times and excessive output complexity, limiting their usability in practical applications. The advent of large language models (LLMs) has opened new possibilities for the generation, editing, and analysis of vector graphics, particularly in the SVG format, which is inherently text-based and well-suited for integration with LLMs. This paper provides a systematic review of existing LLM-based approaches for SVG processing, categorizing them into three main tasks: generation, editing, and understanding. We observe notable models such as IconShop, StrokeNUWA, and StarVector, highlighting their strengths and limitations. Furthermore, we analyze benchmark datasets designed for assessing SVG-related tasks, including SVGEditBench, VGBench, and SGP-Bench, and conduct a series of experiments to evaluate various LLMs in these domains. Our results demonstrate that for vector graphics reasoning-enhanced models outperform standard LLMs, particularly in generation and understanding tasks. Furthermore, our findings underscore the need to develop more diverse and richly annotated datasets to further improve LLM capabilities in vector graphics tasks.
