Table of Contents
Fetching ...

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Raymond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, Wei-Chiu Ma

TL;DR

DRAWER addresses the gap between photorealistic 3D reconstruction and physical interactivity by introducing a compositional dual scene representation that combines neural signed distance fields for geometry with Gaussian splats for high-fidelity appearance. An articulation module infers hinge types and axes, enabling interactive manipulation of drawers and other movable elements, while amodal shape estimation and texturing complete hidden interiors. The system runs in real time, integrates with Unreal Engine and robotics simulators, and supports a real-to-sim-to-real loop for policy learning and deployment. Experiments across six kitchens show superior rendering fidelity, articulation accuracy, and motion realism, and demonstrations include interactive gaming and robot learning. Overall, DRAWER offers a scalable, automated path from real-world video to realistic, interactive digital twins with practical impact for content creation and robotics.

Abstract

Creating virtual digital replicas from real-world data unlocks significant potential across domains like gaming and robotics. In this paper, we present DRAWER, a novel framework that converts a video of a static indoor scene into a photorealistic and interactive digital environment. Our approach centers on two main contributions: (i) a reconstruction module based on a dual scene representation that reconstructs the scene with fine-grained geometric details, and (ii) an articulation module that identifies articulation types and hinge positions, reconstructs simulatable shapes and appearances and integrates them into the scene. The resulting virtual environment is photorealistic, interactive, and runs in real time, with compatibility for game engines and robotic simulation platforms. We demonstrate the potential of DRAWER by using it to automatically create an interactive game in Unreal Engine and to enable real-to-sim-to-real transfer for robotics applications.

DRAWER: Digital Reconstruction and Articulation With Environment Realism

TL;DR

DRAWER addresses the gap between photorealistic 3D reconstruction and physical interactivity by introducing a compositional dual scene representation that combines neural signed distance fields for geometry with Gaussian splats for high-fidelity appearance. An articulation module infers hinge types and axes, enabling interactive manipulation of drawers and other movable elements, while amodal shape estimation and texturing complete hidden interiors. The system runs in real time, integrates with Unreal Engine and robotics simulators, and supports a real-to-sim-to-real loop for policy learning and deployment. Experiments across six kitchens show superior rendering fidelity, articulation accuracy, and motion realism, and demonstrations include interactive gaming and robot learning. Overall, DRAWER offers a scalable, automated path from real-world video to realistic, interactive digital twins with practical impact for content creation and robotics.

Abstract

Creating virtual digital replicas from real-world data unlocks significant potential across domains like gaming and robotics. In this paper, we present DRAWER, a novel framework that converts a video of a static indoor scene into a photorealistic and interactive digital environment. Our approach centers on two main contributions: (i) a reconstruction module based on a dual scene representation that reconstructs the scene with fine-grained geometric details, and (ii) an articulation module that identifies articulation types and hinge positions, reconstructs simulatable shapes and appearances and integrates them into the scene. The resulting virtual environment is photorealistic, interactive, and runs in real time, with compatibility for game engines and robotic simulation platforms. We demonstrate the potential of DRAWER by using it to automatically create an interactive game in Unreal Engine and to enable real-to-sim-to-real transfer for robotics applications.

Paper Structure

This paper contains 44 sections, 13 figures, 7 tables.

Figures (13)

  • Figure 1: DRAWER automatically converts a video of a static scene into an interactable and actionable virtual environment. The reconstructed digital twin features precise geometry, high-fidelity rendering, and supports physical interactions like opening/closing drawers/cabinets and moving/placing objects. It can also be seamlessly integrated with modern game engines and robotic simulation platforms, enabling the creation of interactive games and facilitating real-to-sim-to-real policy transfer.
  • Figure 2: Overview of DRAWER: Given multiple posed images from a single video, we first employ a dual scene representation to capture high-fidelity visual appearance as well as fine-grained geometry. Then we animate the scene by reasoning about articulated and movable rigid-body objects. Finally, our amodal shape estimation with hidden region texturing enables us to create a complete digital twin. Our reconstructions support real-time physical interactions such as opening drawers/cabinets, moving objects, and rendering novel views.
  • Figure 3: Qualitative Comparisons of Different Representations:(Left) Gaussian splatting kerbl20233dHuang2DGS2024 can effectively capture the visual appearance of a scene yet struggles with accurate geometric modeling. (Middle) Neural SDF recovers fine-grained geometry but at the cost of slow rendering speeding and degraded appearance modeling. (Right) Our dual representation combines the strengths of both representations, offering both high-quality appearance and geometry in real time.
  • Figure 4: Articulation Estimation: We visualize the estimated revolute axes and articulated object masks produced by 3DOI qian2023understanding and DRAWER, demonstrating that DRAWER achieves more precise articulation estimation due to its underlying 3D geometry.
  • Figure 5: Our dual scene representation combines Neural SDF and Gaussian splatting. We anchor Gaussians around the reconstructed mesh (zero-level set) extracted from the SDF. For details on our Gaussian splat parameterization, please refer to the supp. material.
  • ...and 8 more figures