Table of Contents
Fetching ...

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu

TL;DR

SVGDreamer tackles the challenge of text-guided SVG generation by decoupling semantic vectorization from refinement. It introduces SIVE to achieve editable, object-level vectorization via cross-attention-guided initialization and attention-mask optimization, and VPSD to refine vector graphics by modeling a distribution over vector primitives with a reward-guided loop and LoRA-based diffusion priors. The approach addresses key drawbacks of prior SDS-based methods, notably over-smoothing, color oversaturation, and limited diversity, while delivering improved editability and stylistic variety. Extensive experiments demonstrate superior performance over baselines in fidelity, diversity, and text alignment, with practical applications in posters and icons. This work advances practical, controllable vector graphics generation guided by natural language prompts.

Abstract

Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduces attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to address issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence of the existing text-to-SVG generation methods by modeling SVGs as distributions of control points and colors. Furthermore, VPSD leverages a reward model to re-weight vector particles, which improves aesthetic appeal and accelerates convergence. Extensive experiments are conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. Project page: https://ximinng.github.io/SVGDreamer-project/

SVGDreamer: Text Guided SVG Generation with Diffusion Model

TL;DR

SVGDreamer tackles the challenge of text-guided SVG generation by decoupling semantic vectorization from refinement. It introduces SIVE to achieve editable, object-level vectorization via cross-attention-guided initialization and attention-mask optimization, and VPSD to refine vector graphics by modeling a distribution over vector primitives with a reward-guided loop and LoRA-based diffusion priors. The approach addresses key drawbacks of prior SDS-based methods, notably over-smoothing, color oversaturation, and limited diversity, while delivering improved editability and stylistic variety. Extensive experiments demonstrate superior performance over baselines in fidelity, diversity, and text alignment, with practical applications in posters and icons. This work advances practical, controllable vector graphics generation guided by natural language prompts.

Abstract

Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduces attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to address issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence of the existing text-to-SVG generation methods by modeling SVGs as distributions of control points and colors. Furthermore, VPSD leverages a reward model to re-weight vector particles, which improves aesthetic appeal and accelerates convergence. Extensive experiments are conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. Project page: https://ximinng.github.io/SVGDreamer-project/
Paper Structure (30 sections, 7 equations, 16 figures, 2 tables)

This paper contains 30 sections, 7 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Given a text prompt, SVGDreamer can generate a variety of vector graphics. SVGDreamer is a versatile tool that can work with various vector styles without being limited to a specific prompt suffix. We utilize various colored suffixes to indicate different styles. The style is governed by vector primitives.
  • Figure 2: Overview of SVGDreamer. The method consists of two parts: semantic-driven image vectorization (SIVE, Sec. \ref{['sec:SIVE']}) and SVG synthesis through VPSD optimization (Sec. \ref{['sec:SVGDreamer']}). The result obtained from SIVE can be used as input of VPSD for further refinement.
  • Figure 3: The process of Vectorized Particle-based Score Distillation. VPSD allows $k$ SVGs as input and simultaneously optimizes $k$ sets of SVG parameters.
  • Figure 4: Qualitative comparison of different methods. Note that DiffSketcher was originally designed for vector sketch generation; therefore, we re-implemented it to generate RGB vector graphics.
  • Figure 5: Examples of vector assets created by SVGDreamer. We specify foreground content as an SVG asset through a text prompt. To create assets that fit the SVG style, such as flat polygon vector, we constrain the vector representation via using a different prompt modifier to encourage the appropriate style: * ... on a white background, full body action pose, complete body, concept art, flat 2d vector icon.
  • ...and 11 more figures