Table of Contents
Fetching ...

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Yiren Song, Danze Chen, Mike Zheng Shou

TL;DR

LayerTracer introduces a cognitive-aligned approach to layered SVG synthesis using a diffusion-transformer framework that learns designers' layer-by-layer workflows from a large-scale sequential design dataset. It deploys a two-stage pipeline with a text-conditioned DiT that generates raster construction sequences and a layer-wise vectorization stage for clean, editable SVGs, complemented by an image-conditioned variant through LoRA to infer construction steps from reference images. The method achieves state-of-the-art results in both text-to-SVG generation and hierarchical vectorization, with improved editability, reduced path redundancy, and coherent layer hierarchies validated by metrics and user studies. By releasing a 20k-sample serpentine dataset and demonstrating scalable, process-aware vector graphics, LayerTracer paves the way for cognitive-informed design tooling and more editable AI-generated vector art.

Abstract

Generating cognitive-aligned layered SVGs remains challenging due to existing methods' tendencies toward either oversimplified single-layer outputs or optimization-induced shape redundancies. We propose LayerTracer, a diffusion transformer based framework that bridges this gap by learning designers' layered SVG creation processes from a novel dataset of sequential design operations. Our approach operates in two phases: First, a text-conditioned DiT generates multi-phase rasterized construction blueprints that simulate human design workflows. Second, layer-wise vectorization with path deduplication produces clean, editable SVGs. For image vectorization, we introduce a conditional diffusion mechanism that encodes reference images into latent tokens, guiding hierarchical reconstruction while preserving structural integrity. Extensive experiments demonstrate LayerTracer's superior performance against optimization-based and neural baselines in both generation quality and editability, effectively aligning AI-generated vectors with professional design cognition.

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

TL;DR

LayerTracer introduces a cognitive-aligned approach to layered SVG synthesis using a diffusion-transformer framework that learns designers' layer-by-layer workflows from a large-scale sequential design dataset. It deploys a two-stage pipeline with a text-conditioned DiT that generates raster construction sequences and a layer-wise vectorization stage for clean, editable SVGs, complemented by an image-conditioned variant through LoRA to infer construction steps from reference images. The method achieves state-of-the-art results in both text-to-SVG generation and hierarchical vectorization, with improved editability, reduced path redundancy, and coherent layer hierarchies validated by metrics and user studies. By releasing a 20k-sample serpentine dataset and demonstrating scalable, process-aware vector graphics, LayerTracer paves the way for cognitive-informed design tooling and more editable AI-generated vector art.

Abstract

Generating cognitive-aligned layered SVGs remains challenging due to existing methods' tendencies toward either oversimplified single-layer outputs or optimization-induced shape redundancies. We propose LayerTracer, a diffusion transformer based framework that bridges this gap by learning designers' layered SVG creation processes from a novel dataset of sequential design operations. Our approach operates in two phases: First, a text-conditioned DiT generates multi-phase rasterized construction blueprints that simulate human design workflows. Second, layer-wise vectorization with path deduplication produces clean, editable SVGs. For image vectorization, we introduce a conditional diffusion mechanism that encodes reference images into latent tokens, guiding hierarchical reconstruction while preserving structural integrity. Extensive experiments demonstrate LayerTracer's superior performance against optimization-based and neural baselines in both generation quality and editability, effectively aligning AI-generated vectors with professional design cognition.

Paper Structure

This paper contains 22 sections, 5 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: LayerTracer creates cognitively aligned layered SVGs from text prompts or by converting images into layered SVGs.
  • Figure 2: The LayerTracer architecture comprises three key components: (1) Layer-wise Model: Pretrained on our proposed dataset to generate layered pixel sequences from text prompt; (2) Image2Layers Model: Merges LoRA with the Flux base DiT, enabling image-conditioned generation through VAE-encoded latent tokens; (3) Layer-wise Vectorization: Converts raster sequences to SVGs via differential analysis between adjacent layers, followed by Bézier optimization using vtracer to eliminate redundant paths while preserving structural fidelity.
  • Figure 3: Given a text prompt, LayerTracer generates cognitive-aligned layered SVGs that mimic human design cognition.
  • Figure 4: Given a raster image of an icon as input, LayerTracer predicts how the icon was created layer by layer, achieving cognitive-aligned layered vectorization.
  • Figure 5: Layer-wise SVG generation with color gradients.
  • ...and 8 more figures