Table of Contents
Fetching ...

Controlling Your Image via Simplified Vector Graphics

Lanqing Guo, Xi Liu, Yufei Wang, Zhihao Li, Siyu Huang

TL;DR

This work introduces layer-wise controllable generation through simplified vector graphics through simplified VG representations, and designs a novel image synthesis framework guided by VGs, allowing users to freely modify elements and seamlessly translate these edits into photorealistic outputs.

Abstract

Recent advances in image generation have achieved remarkable visual quality, while a fundamental challenge remains: Can image generation be controlled at the element level, enabling intuitive modifications such as adjusting shapes, altering colors, or adding and removing objects? In this work, we address this challenge by introducing layer-wise controllable generation through simplified vector graphics (VGs). Our approach first efficiently parses images into hierarchical VG representations that are semantic-aligned and structurally coherent. Building on this representation, we design a novel image synthesis framework guided by VGs, allowing users to freely modify elements and seamlessly translate these edits into photorealistic outputs. By leveraging the structural and semantic features of VGs in conjunction with noise prediction, our method provides precise control over geometry, color, and object semantics. Extensive experiments demonstrate the effectiveness of our approach in diverse applications, including image editing, object-level manipulation, and fine-grained content creation, establishing a new paradigm for controllable image generation. Project page: https://guolanqing.github.io/Vec2Pix/

Controlling Your Image via Simplified Vector Graphics

TL;DR

This work introduces layer-wise controllable generation through simplified vector graphics through simplified VG representations, and designs a novel image synthesis framework guided by VGs, allowing users to freely modify elements and seamlessly translate these edits into photorealistic outputs.

Abstract

Recent advances in image generation have achieved remarkable visual quality, while a fundamental challenge remains: Can image generation be controlled at the element level, enabling intuitive modifications such as adjusting shapes, altering colors, or adding and removing objects? In this work, we address this challenge by introducing layer-wise controllable generation through simplified vector graphics (VGs). Our approach first efficiently parses images into hierarchical VG representations that are semantic-aligned and structurally coherent. Building on this representation, we design a novel image synthesis framework guided by VGs, allowing users to freely modify elements and seamlessly translate these edits into photorealistic outputs. By leveraging the structural and semantic features of VGs in conjunction with noise prediction, our method provides precise control over geometry, color, and object semantics. Extensive experiments demonstrate the effectiveness of our approach in diverse applications, including image editing, object-level manipulation, and fine-grained content creation, establishing a new paradigm for controllable image generation. Project page: https://guolanqing.github.io/Vec2Pix/
Paper Structure (13 sections, 14 equations, 6 figures, 2 tables)

This paper contains 13 sections, 14 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overall framework of our Vec2Pix and its workflow. 1Prepare SVG: the input is obtained by converting a real or AI-generated image into SVG, or by selecting an existing SVG from a gallery. 2SVG-to-Image: the SVG information will be conditioned using token concatenation and noise prediction from vectors (NPV) module. The NPV module incorporates the SVG condition and integrates trainable LoRA adapters and prediction heads to estimate the mean and variance of the initial noise, rather than directly sampling from Gaussian noise. If the user wishes to re-generate or modify specific parts, we proceed with steps 3– 5.3Image-to-SVG: the generated image is converted back into SVG using a diffusion model to produce multiple layers, followed by SAM to generate semantic masks for each layer, and further refined via 2D Gaussian optimization. 4SVG Editing: users can interactively edit the SVG by adjusting curves and attributes. 5Re-generation: the modified SVG is used as guidance to synthesize the final updated result.
  • Figure 2: Our hierarchical and simplified vector graphics. As illustrated, the SVG is decomposed into the robot "body", which further contains semantic parts such as the "head", "upper body", "legs", and "shoes". The "head" is hierarchically subdivided into the "eye region", "ears", and "mouth", and the "eye region" is further decomposed into the "left eye" and "right eye".
  • Figure 3: Our Vec2Pix supports various controllable image generation and editing tasks.
  • Figure 4: Visual comparisons with text-prompt-guided editing methods, including state-of-the-art open-source and commercial solutions such as GPT-4o gpt4o2024, Qwen-Image wu2025qwen, Flux-Kontext labs2025flux, and ICEdit zhang2025context. Editing cases such as shape modification, object repositioning, and color adjustment are readily supported by our VG representation, whereas text-guided editing often fails.
  • Figure 5: (a) Ablation study comparing results with and without the proposed Noise Prediction from Vectors (NPV) module, as well as training NPV module with different iterations. (b) PSNR and FID performance variations under different vector scales used to adjust the conditioning strength. By adjusting the vector scale, our method can flexibly adapt to complex appearance effects such as water reflections, smoke, and lighting variations, without being overly constrained by the SVG geometry.
  • ...and 1 more figures