Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Ronghuan Wu, Wanchao Su, Jing Liao
TL;DR
Chat2SVG tackles the challenge of producing high-quality, semantically regular SVGs from text by combining Large Language Models for template generation with image diffusion models for geometry refinement. The method introduces an SVG-oriented prompt design, a dual-stage optimization in latent and point spaces, and an iterative natural-language editing loop to maintain semantic and visual coherence. Empirical results show superior visual fidelity, path regularity, and text alignment versus strong baselines, supported by a user study favoring Chat2SVG outputs. The work has practical impact by enabling accessible professional vector graphics creation and interactive editing, with promising avenues for further enhancement and extension to related vector formats.
Abstract
Scalable Vector Graphics (SVG) has become the de facto standard for vector graphics in digital design, offering resolution independence and precise control over individual elements. Despite their advantages, creating high-quality SVG content remains challenging, as it demands technical expertise with professional editing software and a considerable time investment to craft complex shapes. Recent text-to-SVG generation methods aim to make vector graphics creation more accessible, but they still encounter limitations in shape regularity, generalization ability, and expressiveness. To address these challenges, we introduce Chat2SVG, a hybrid framework that combines the strengths of Large Language Models (LLMs) and image diffusion models for text-to-SVG generation. Our approach first uses an LLM to generate semantically meaningful SVG templates from basic geometric primitives. Guided by image diffusion models, a dual-stage optimization pipeline refines paths in latent space and adjusts point coordinates to enhance geometric complexity. Extensive experiments show that Chat2SVG outperforms existing methods in visual fidelity, path regularity, and semantic alignment. Additionally, our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users.
