Table of Contents
Fetching ...

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

Haomin Wang, Qi Wei, Qianli Ma, Shengyuan Ding, Jinhui Yin, Kai Chen, Hongjie Zhang

Abstract

With the rapid advancement of vision-language models, an increasing number of studies have explored their potential for SVG generation tasks. Although existing approaches improve performance by constructing large-scale SVG datasets and introducing SVG-specific tokens, they still suffer from limited generalization, redundant paths in code outputs, and a lack of explicit reasoning. In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model's reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset containing 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. By training the model to generate group-level structured SVG code, CTRL-S significantly improves structural coherence and visual fidelity. Furthermore, we adopt the GRPO algorithm and design a multi-reward optimization framework, incorporating DINO, image-text similarity, format, and code efficiency rewards. Through joint multi-reward optimization and multi-task training, our approach systematically enhances overall generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior SVG code quality, and exceptional visual fidelity.

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

Abstract

With the rapid advancement of vision-language models, an increasing number of studies have explored their potential for SVG generation tasks. Although existing approaches improve performance by constructing large-scale SVG datasets and introducing SVG-specific tokens, they still suffer from limited generalization, redundant paths in code outputs, and a lack of explicit reasoning. In this work, we present CTRL-S (Chain-of-Thought Reinforcement Learning for SVG), a unified framework that introduces a chain-of-thought mechanism to explicitly expose the model's reasoning process during SVG generation. To support this structured reasoning, we construct SVG-Sophia, a high-quality dataset containing 145K samples across SVG code refinement, Text-to-SVG, and Image-to-SVG tasks. By training the model to generate group-level structured SVG code, CTRL-S significantly improves structural coherence and visual fidelity. Furthermore, we adopt the GRPO algorithm and design a multi-reward optimization framework, incorporating DINO, image-text similarity, format, and code efficiency rewards. Through joint multi-reward optimization and multi-task training, our approach systematically enhances overall generation capabilities. Extensive experiments show that CTRL-S outperforms existing methods, achieving higher task success rates, superior SVG code quality, and exceptional visual fidelity.
Paper Structure (18 sections, 10 equations, 7 figures, 4 tables)

This paper contains 18 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of CTRL-S.(Top Left) The Multi-Task Multi-Reward GRPO training framework integrates diverse generation tasks (Text-to-SVG, Image-to-SVG, and Code Refinement) guided by multiple rewards. (Bottom Left) During inference, CTRL-S leverages chain-of-thought reasoning to plan step-by-step drawing operations before generating the final group-level structured SVG code, ensuring a clear one-to-one correspondence between the reasoning steps and the generated code groups. (Right) Examples of high-quality generated SVGs and successful code refinement processes.
  • Figure 2: The overall pipeline of CTRL-S.(1) Two-Stage SFT: The model is first trained on 1M SAgoge samples to align SVG-specific tokens, and then fine-tuned on SVG-Sophia to learn CoT-structured responses with explicit step-wise planning. (2) Multi-Task Multi-Reward RL: We jointly optimize Text-to-SVG, Image-to-SVG, and SVG refinement tasks via a multi-reward mechanism, including Format Reward, DINO Reward, Image-text Similarity Reward, and Code Efficiency Reward, to improve structural validity, visual fidelity, semantic alignment, and concise code generation.
  • Figure 3: Qualitative comparisons of SVG generation and code refinement between baselines and CTRL-S.
  • Figure 4: Qualitative visualization of SVG generation quality across RL training steps.
  • Figure 5: Examples of Text-to-SVG in SVG-Sophia.
  • ...and 2 more figures