Table of Contents
Fetching ...

SVG360: Multi-View SVG Generation with Geometric and Color Consistency from a Single SVG

Mengnan Jiang, Zhaolin Sun, Christian Franke, Michele Franco Adesso, Antonio Haas, Grace Li Zhang

TL;DR

SVG360 tackles the problem of producing multi-view, fully editable SVGs from a single input by integrating a three-stage pipeline: 3D-aware multi-view raster generation with appearance harmonization, spatially aligned segmentation propagation across a viewing-sphere, and vector-domain consolidation to yield compact, coherent vector paths. The method leverages Trellis for 3D-based raster synthesis, Gaussian splatting for efficient multi-view rendering, and a Spatial-SAM2 module that propagates part-level segmentation using a sphere-based proximity metric $d(\theta_i,\theta_j)=\mathrm{atan2}(\|u_i \times u_j\|, \mathrm{clip}(u_i\cdot u_j,-1,1))$. It then converts raster segments to vector paths via VTracer, followed by color and topology consolidation including color mapping against a reference palette using $CIEDE2000$ distance to ensure cross-view consistency. Quantitative results show reduced path counts and color drift across views compared to Turntable, demonstrating improved geometric stability and editability for design workflows and enabling more scalable asset creation and semantic vector editing.

Abstract

Scalable Vector Graphics (SVGs) are central to modern design workflows, offering scaling without distortion and precise editability. However, for single object SVGs, generating multi-view consistent SVGs from a single-view input remains underexplored. We present a three stage framework that produces multi-view SVGs with geometric and color consistency from a single SVG input. First, the rasterized input is lifted to a 3D representation and rendered under target camera poses, producing multi-view images of the object. Next, we extend the temporal memory mechanism of Segment Anything 2 (SAM2) to the spatial domain, constructing a spatial memory bank that establishes part level correspondences across neighboring views, yielding cleaner and more consistent vector paths and color assignments without retraining. Finally, during the raster to vector conversion, we perform path consolidation and structural optimization to reduce redundancy while preserving boundaries and semantics. The resulting SVGs exhibit strong geometric and color consistency across views, significantly reduce redundant paths, and retain fine structural details. This work bridges generative modeling and structured vector representation, providing a scalable route to single input, object level multi-view SVG generation and supporting applications such as asset creation and semantic vector editing.

SVG360: Multi-View SVG Generation with Geometric and Color Consistency from a Single SVG

TL;DR

SVG360 tackles the problem of producing multi-view, fully editable SVGs from a single input by integrating a three-stage pipeline: 3D-aware multi-view raster generation with appearance harmonization, spatially aligned segmentation propagation across a viewing-sphere, and vector-domain consolidation to yield compact, coherent vector paths. The method leverages Trellis for 3D-based raster synthesis, Gaussian splatting for efficient multi-view rendering, and a Spatial-SAM2 module that propagates part-level segmentation using a sphere-based proximity metric . It then converts raster segments to vector paths via VTracer, followed by color and topology consolidation including color mapping against a reference palette using distance to ensure cross-view consistency. Quantitative results show reduced path counts and color drift across views compared to Turntable, demonstrating improved geometric stability and editability for design workflows and enabling more scalable asset creation and semantic vector editing.

Abstract

Scalable Vector Graphics (SVGs) are central to modern design workflows, offering scaling without distortion and precise editability. However, for single object SVGs, generating multi-view consistent SVGs from a single-view input remains underexplored. We present a three stage framework that produces multi-view SVGs with geometric and color consistency from a single SVG input. First, the rasterized input is lifted to a 3D representation and rendered under target camera poses, producing multi-view images of the object. Next, we extend the temporal memory mechanism of Segment Anything 2 (SAM2) to the spatial domain, constructing a spatial memory bank that establishes part level correspondences across neighboring views, yielding cleaner and more consistent vector paths and color assignments without retraining. Finally, during the raster to vector conversion, we perform path consolidation and structural optimization to reduce redundancy while preserving boundaries and semantics. The resulting SVGs exhibit strong geometric and color consistency across views, significantly reduce redundant paths, and retain fine structural details. This work bridges generative modeling and structured vector representation, providing a scalable route to single input, object level multi-view SVG generation and supporting applications such as asset creation and semantic vector editing.

Paper Structure

This paper contains 15 sections, 3 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Our pipeline begins by converting the input SVG into a raster image, followed by rendering 3D consistent multi-view rasters using Trellis. A lightweight LoRA hu2021lora tuned FLUX blackforestlabs2024flux model is then applied to harmonize their appearance. The refined rasters are processed by our Spatial SAM2 module, which replaces temporal adjacency in SAM2 with a spatial nearest-neighbor traversal on the viewing sphere. During segmentation propagation, the Spatial Memory Selector retrieves the geometrically most relevant memory entries. For the target view $t_{n-3}$, spatially adjacent views such as $t_{9}$, $t_{55}$, or $t_{n-6}$ may be closer than its temporal neighbor $t_{n-4}$, which helps maintain part-level consistency across viewpoints. The resulting masks are vectorized by VTracer vtracer2023 and further refined in the vector domain to produce compact, editable, and cross-view consistent multi-view SVGs.
  • Figure 2: Qualitative comparison. The figure summarizes representative issues observed in Adobe Turntable: (a) geometric inconsistencies across adjacent views; (b) cluttered structures arising from overlapping thin components; (c) merging of parts with similar colors; (d) missing regions in certain viewpoints; (e) color drift and gradual loss of small details across views. Our method produces multi-view SVGs with stable geometry, clear part separation, and consistent color appearance.
  • Figure 3: Segmentation comparison of Spatial-SAM2 and the original SAM2 tracking mode at view 47. The bottom row shows the each six reference views used for memory support, with view 0 serving as the initial key frame. Results shown here correspond to the first iteration before any subsequent refinement.
  • Figure 4: Ablation study of four segmentation strategies: Spatial SAM2 (ours), SAM2 Tracking Mode, SAM2 Auto Mode, and a segmentation free VTracer baseline.