Table of Contents
Fetching ...

DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction

Xia Su, Cuong Nguyen, Matheus A. Gadelha, Jon E. Froehlich

TL;DR

DepthScape tackles the challenge of creating 2.5D effects by placing 2D design elements into monocular depth reconstructions. It combines MoGE-based depth space reconstruction with GPT-4o-driven visual-program synthesis to automatically extract parametric anchors, enabling direct manipulation on a 2D canvas while preserving realistic occlusion and perspective. The approach is validated through a formative user study, a technical evaluation on 100 stock images, and expert reviews, demonstrating feasibility, robustness, and practical potential. Five application scenarios illustrate DepthScape's versatility, including integration with image editors, video 2.5D, AR-like simulations, and storyboard workflows. Overall, DepthScape offers a scalable, human-centered bridge between 2D design and depth-aware composition, reducing the barrier to creating rich 2.5D visuals.

Abstract

2.5D effects, such as occlusion and perspective foreshortening, enhance visual dynamics and realism by incorporating 3D depth cues into 2D designs. However, creating such effects remains challenging and labor-intensive due to the complexity of depth perception. We introduce DepthScape, a human-AI collaborative system that facilitates 2.5D effect creation by directly placing design elements into 3D reconstructions. Using monocular depth reconstruction, DepthScape transforms images into 3D reconstructions where visual contents are placed to automatically achieve realistic occlusion and perspective foreshortening. To further simplify 3D placement through a 2D viewport, DepthScape uses a vision-language model to analyze source images and extract key visual components as content anchors for direct manipulation editing. We evaluate DepthScape with nine participants of varying design backgrounds, confirming the effectiveness of our creation pipeline. We also test on 100 professional stock images to assess robustness, and conduct an expert evaluation that confirms the quality of DepthScape's results.

DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction

TL;DR

DepthScape tackles the challenge of creating 2.5D effects by placing 2D design elements into monocular depth reconstructions. It combines MoGE-based depth space reconstruction with GPT-4o-driven visual-program synthesis to automatically extract parametric anchors, enabling direct manipulation on a 2D canvas while preserving realistic occlusion and perspective. The approach is validated through a formative user study, a technical evaluation on 100 stock images, and expert reviews, demonstrating feasibility, robustness, and practical potential. Five application scenarios illustrate DepthScape's versatility, including integration with image editors, video 2.5D, AR-like simulations, and storyboard workflows. Overall, DepthScape offers a scalable, human-centered bridge between 2D design and depth-aware composition, reducing the barrier to creating rich 2.5D visuals.

Abstract

2.5D effects, such as occlusion and perspective foreshortening, enhance visual dynamics and realism by incorporating 3D depth cues into 2D designs. However, creating such effects remains challenging and labor-intensive due to the complexity of depth perception. We introduce DepthScape, a human-AI collaborative system that facilitates 2.5D effect creation by directly placing design elements into 3D reconstructions. Using monocular depth reconstruction, DepthScape transforms images into 3D reconstructions where visual contents are placed to automatically achieve realistic occlusion and perspective foreshortening. To further simplify 3D placement through a 2D viewport, DepthScape uses a vision-language model to analyze source images and extract key visual components as content anchors for direct manipulation editing. We evaluate DepthScape with nine participants of varying design backgrounds, confirming the effectiveness of our creation pipeline. We also test on 100 professional stock images to assess robustness, and conduct an expert evaluation that confirms the quality of DepthScape's results.

Paper Structure

This paper contains 31 sections, 18 figures, 2 tables.

Figures (18)

  • Figure 1: Design result gallery of DepthScape. Top row icons show the formation of designs, with one or multiple Planar, Cylindrical, and Spherical parametric anchors.
  • Figure 2: Three types of parametric anchoring supported by DepthScape. From left to right: Planar, Cylindrical, Spherical
  • Figure 3: Proof-of-concept prototype interface.
  • Figure 4: In our formative user study, we request participants to replicate two designs (left) with provided assets (middle) and our prototype system. Example results are shown on the right.
  • Figure 5: Design results of the open-ended exploration of the formative user study.
  • ...and 13 more figures