Table of Contents
Fetching ...

SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

Ximing Xing, Qian Yu, Chuang Wang, Haitao Zhou, Jing Zhang, Dong Xu

TL;DR

SVGDreamer++ tackles the editability and diversity gaps in text-guided SVG generation. It introduces a two-stage framework: SIVE for semantic-driven vectorization and VPSD for distribution-based refinement, augmented with HIVE for hierarchical segmentation and Adaptive Vector Primitives Control for dynamic path counts. The approach leverages segmentation priors and diffusion model attention, uses LoRA-based distribution estimation, and applies Reward Feedback Learning to enhance aesthetics and convergence. Empirical results show substantial gains in editability, visual quality, and diversity across multiple vector styles and applications like poster design and vector assets, with code and demos to be released.

Abstract

Recently, text-guided scalable vector graphics (SVG) synthesis has demonstrated significant potential in domains such as iconography and sketching. However, SVGs generated from existing Text-to-SVG methods often lack editability and exhibit deficiencies in visual quality and diversity. In this paper, we propose a novel text-guided vector graphics synthesis method to address these limitations. To enhance the editability of output SVGs, we introduce a Hierarchical Image VEctorization (HIVE) framework that operates at the semantic object level and supervises the optimization of components within the vector object. This approach facilitates the decoupling of vector graphics into distinct objects and component levels. Our proposed HIVE algorithm, informed by image segmentation priors, not only ensures a more precise representation of vector graphics but also enables fine-grained editing capabilities within vector objects. To improve the diversity of output SVGs, we present a Vectorized Particle-based Score Distillation (VPSD) approach. VPSD addresses over-saturation issues in existing methods and enhances sample diversity. A pre-trained reward model is incorporated to re-weight vector particles, improving aesthetic appeal and enabling faster convergence. Additionally, we design a novel adaptive vector primitives control strategy, which allows for the dynamic adjustment of the number of primitives, thereby enhancing the presentation of graphic details. Extensive experiments validate the effectiveness of the proposed method, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. We also show that our new method supports up to six distinct vector styles, capable of generating high-quality vector assets suitable for stylized vector design and poster design. Code and demo will be released at: http://ximinng.github.io/SVGDreamerV2Project/

SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

TL;DR

SVGDreamer++ tackles the editability and diversity gaps in text-guided SVG generation. It introduces a two-stage framework: SIVE for semantic-driven vectorization and VPSD for distribution-based refinement, augmented with HIVE for hierarchical segmentation and Adaptive Vector Primitives Control for dynamic path counts. The approach leverages segmentation priors and diffusion model attention, uses LoRA-based distribution estimation, and applies Reward Feedback Learning to enhance aesthetics and convergence. Empirical results show substantial gains in editability, visual quality, and diversity across multiple vector styles and applications like poster design and vector assets, with code and demos to be released.

Abstract

Recently, text-guided scalable vector graphics (SVG) synthesis has demonstrated significant potential in domains such as iconography and sketching. However, SVGs generated from existing Text-to-SVG methods often lack editability and exhibit deficiencies in visual quality and diversity. In this paper, we propose a novel text-guided vector graphics synthesis method to address these limitations. To enhance the editability of output SVGs, we introduce a Hierarchical Image VEctorization (HIVE) framework that operates at the semantic object level and supervises the optimization of components within the vector object. This approach facilitates the decoupling of vector graphics into distinct objects and component levels. Our proposed HIVE algorithm, informed by image segmentation priors, not only ensures a more precise representation of vector graphics but also enables fine-grained editing capabilities within vector objects. To improve the diversity of output SVGs, we present a Vectorized Particle-based Score Distillation (VPSD) approach. VPSD addresses over-saturation issues in existing methods and enhances sample diversity. A pre-trained reward model is incorporated to re-weight vector particles, improving aesthetic appeal and enabling faster convergence. Additionally, we design a novel adaptive vector primitives control strategy, which allows for the dynamic adjustment of the number of primitives, thereby enhancing the presentation of graphic details. Extensive experiments validate the effectiveness of the proposed method, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. We also show that our new method supports up to six distinct vector styles, capable of generating high-quality vector assets suitable for stylized vector design and poster design. Code and demo will be released at: http://ximinng.github.io/SVGDreamerV2Project/

Paper Structure

This paper contains 29 sections, 8 equations, 17 figures, 2 tables, 1 algorithm.

Figures (17)

  • Figure 1: SVGs produced by SVGDremaer++. Given a text prompt, SVGDreamer++ can generate a variety of vector graphics. SVGDreamer++ is a versatile tool that can work with various vector styles without being limited to a specific prompt suffix. We utilize various colored suffixes to indicate different styles. The style is governed by vector primitives.
  • Figure 2: The pipeline of SIVE. SIVE comprises two primary modules: primitive initialization and semantic-aware optimization. The primitive initialization module leverages diffusion model attention priors to initially delineate the paths of the corresponding vector objects. Subsequently, an attention-based mask loss function is introduced to facilitate the hierarchical optimization of these vector objects.
  • Figure 3: The process of Vectorized Particle-based Score Distillation. VPSD accepts $k$ sets of SVG parameters as input. VPSD models SVG as a distribution of vector paths and color parameters, estimating these parameters through the application of the LoRA network. Through the estimation of the SVG parameter distribution, VPSD achieves a greater diversity of outputs compared to VF vectorfusion_jain_2023. Moreover, to enhance the aesthetic quality of the vector outputs, a pretrained reward model imagereward_xu_2023 is employed to optimize the training process of the estimation network.
  • Figure 4: Overview of SVGDreamer++. Our method consists of two phases: Hierarchical image vectorization (Sec. \ref{['sec:hive']}) and optimized synthesis of diverse SVGs via VPSD (Sec. \ref{['sec:vpsd']}). And an additional module, called Adaptive Vector Primitives Control (Sec. \ref{['sec:adaptive_path_control']}), can be plugged into HIVE and VPSD in a plug-and-play way. In HIVE we introduced two stages of mask generation (as shown in the dotted box). Coarse mask generation guided by prompt words and fine-grained mask generation guided by attention distribution are used to decouple the components of vector graphics. The result from HIVE can be used as input for further generation of VPSD. We maintain $k$ sets of SVG parameters in VPSD for obtaining diverse results. In addition, the brown dotted box represents adaptive vector primitive control technology, which dynamically builds vector paths based on gradient graphs to improve the quality of SVG synthesis.
  • Figure 5: The limitation of SIVE. When the cross attention map extracted from the LDM has a much lower resolution (e.g., 32x32) compared to the target vector graphic (e.g., 512x512), the results may have inaccurate boundaries.
  • ...and 12 more figures