Table of Contents
Fetching ...

Illustrator's Depth: Monocular Layer Index Prediction for Image Decomposition

Nissim Maruani, Peiying Zhang, Siddhartha Chaudhuri, Matthew Fisher, Nanxuan Zhao, Vladimir G. Kim, Pierre Alliez, Mathieu Desbrun, Wang Yifan

TL;DR

Illustrator's Depth reframes depth as an editable, per-pixel layer index rather than a physical metric, enabling robust image decomposition into layer-ordered vector graphics. The approach trains a network to predict a continuous per-pixel depth map $D(I)$ from raster inputs by rasterizing layered SVGs into ground-truth depth via a base-256 encoding and applying a scale-invariant MAE objective. Coupled with a dedicated vectorization pipeline, the method achieves state-of-the-art layer ordering and visual fidelity, enabling high-quality vectorization, text-to-vector generation, and depth-aware editing, with additional benefits for 3D relief generation and tactile graphics. The work demonstrates strong generalization across diverse inputs and datasets, and points toward future zero-shot inference and broader applicability in creative workflows. Overall, Illustrator's Depth provides a practical, edit-friendly foundation for decomposing images into layered, manipulable representations.

Abstract

We introduce Illustrator's Depth, a novel definition of depth that addresses a key challenge in digital content creation: decomposing flat images into editable, ordered layers. Inspired by an artist's compositional process, illustrator's depth infers a layer index to each pixel, forming an interpretable image decomposition through a discrete, globally consistent ordering of elements optimized for editability. We also propose and train a neural network using a curated dataset of layered vector graphics to predict layering directly from raster inputs. Our layer index inference unlocks a range of powerful downstream applications. In particular, it significantly outperforms state-of-the-art baselines for image vectorization while also enabling high-fidelity text-to-vector-graphics generation, automatic 3D relief generation from 2D images, and intuitive depth-aware editing. By reframing depth from a physical quantity to a creative abstraction, illustrator's depth prediction offers a new foundation for editable image decomposition.

Illustrator's Depth: Monocular Layer Index Prediction for Image Decomposition

TL;DR

Illustrator's Depth reframes depth as an editable, per-pixel layer index rather than a physical metric, enabling robust image decomposition into layer-ordered vector graphics. The approach trains a network to predict a continuous per-pixel depth map from raster inputs by rasterizing layered SVGs into ground-truth depth via a base-256 encoding and applying a scale-invariant MAE objective. Coupled with a dedicated vectorization pipeline, the method achieves state-of-the-art layer ordering and visual fidelity, enabling high-quality vectorization, text-to-vector generation, and depth-aware editing, with additional benefits for 3D relief generation and tactile graphics. The work demonstrates strong generalization across diverse inputs and datasets, and points toward future zero-shot inference and broader applicability in creative workflows. Overall, Illustrator's Depth provides a practical, edit-friendly foundation for decomposing images into layered, manipulable representations.

Abstract

We introduce Illustrator's Depth, a novel definition of depth that addresses a key challenge in digital content creation: decomposing flat images into editable, ordered layers. Inspired by an artist's compositional process, illustrator's depth infers a layer index to each pixel, forming an interpretable image decomposition through a discrete, globally consistent ordering of elements optimized for editability. We also propose and train a neural network using a curated dataset of layered vector graphics to predict layering directly from raster inputs. Our layer index inference unlocks a range of powerful downstream applications. In particular, it significantly outperforms state-of-the-art baselines for image vectorization while also enabling high-fidelity text-to-vector-graphics generation, automatic 3D relief generation from 2D images, and intuitive depth-aware editing. By reframing depth from a physical quantity to a creative abstraction, illustrator's depth prediction offers a new foundation for editable image decomposition.

Paper Structure

This paper contains 56 sections, 2 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Overview. Given an input image, our model predicts Illustrator’s Depth, a learned ordering of compositional layers that reflects how an artist might have structured the image layout. This representation, applicable broadly to illustrations (left), paintings (middle), or even some realistic images (right), enables multiple downstream applications such as vectorization, intuitive editing, text-to-vector generation, and 3D relief fabrication.
  • Figure 2: Physical vs. Illustrator's Depth. Unlike monocular depth estimation, illustrator’s depth (middle, in false colors) produces piecewise-flat regions corresponding to layers and preserves compositional ordering even for printed or flat elements (e.g., shadows, drawings, or textures) that lack real-world depth (right).
  • Figure 3: Depth-aware Image Vectorization. Our predicted illustrator’s depth map (bottom left) can be integrated in traditional vectorization pipelines to produce well-layered SVG images (right, in 3D for clarity). On this example, our model allows the grouping of two disconnected white clusters to form a single background layer, while accurately separating the white highlights.
  • Figure 4: Predicted illustrator's depth evaluation. Conventional monocular depth models (DepthAnything-v2 yang_depth_2024 (d), DepthPro bochkovskii_depth_2025 (e)) predict physical depth; in contrast, our model (c) accurately infers layer indices suitable for illustration decomposition.
  • Figure 5: Image vectorization with illustrator's depth. Paired with standard vectorization pipelines, our method produces editable, depth-ordered SVGs that closely preserve the structure of the input image. Compared to heuristic. optimization-driven, or learning-based baselines, our approach systematically yields much cleaner layering and higher visual fidelity.
  • ...and 15 more figures