Table of Contents
Fetching ...

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video

Hengyi Wang, Jingwen Wang, Lourdes Agapito

TL;DR

MorpheuS tackles the challenge of reconstructing accurate geometry and vivid appearance for dynamic scenes from casual monocular RGB-D video by decoupling deformation from a hyper-dimensional canonical field and leveraging a diffusion-prior for realistic completion of unobserved regions. The method warps observed points into a canonical space and uses a hash-encoded SDF/color field to enable 360° rendering, while distilling knowledge from a view-conditioned diffusion model via Score Distillation Sampling. Its optimization integrates real-view supervision with diffusion-based priors and regularizations in both canonical and parameter spaces, augmented by temporal conditioning and view-aware weighting to stabilize training. Empirical results on real and synthetic datasets show improved geometry accuracy, complete unobserved regions with realistic textures, and strong novel-view synthesis, highlighting the practical impact for robust, model-agnostic dynamic scene reconstruction. The work advances neural rendering for dynamic scenes by combining canonical-space regularization with diffusion priors to achieve high-fidelity, full-cycle reconstructions from casual RGB-D input.

Abstract

Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion. To tackle this challenge, we introduce MorpheuS, a framework for dynamic 360° surface reconstruction from a casually captured RGB-D video. Our approach models the target scene as a canonical field that encodes its geometry and appearance, in conjunction with a deformation field that warps points from the current frame to the canonical space. We leverage a view-dependent diffusion prior and distill knowledge from it to achieve realistic completion of unobserved regions. Experimental results on various real-world and synthetic datasets show that our method can achieve high-fidelity 360° surface reconstruction of a deformable object from a monocular RGB-D video.

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video

TL;DR

MorpheuS tackles the challenge of reconstructing accurate geometry and vivid appearance for dynamic scenes from casual monocular RGB-D video by decoupling deformation from a hyper-dimensional canonical field and leveraging a diffusion-prior for realistic completion of unobserved regions. The method warps observed points into a canonical space and uses a hash-encoded SDF/color field to enable 360° rendering, while distilling knowledge from a view-conditioned diffusion model via Score Distillation Sampling. Its optimization integrates real-view supervision with diffusion-based priors and regularizations in both canonical and parameter spaces, augmented by temporal conditioning and view-aware weighting to stabilize training. Empirical results on real and synthetic datasets show improved geometry accuracy, complete unobserved regions with realistic textures, and strong novel-view synthesis, highlighting the practical impact for robust, model-agnostic dynamic scene reconstruction. The work advances neural rendering for dynamic scenes by combining canonical-space regularization with diffusion priors to achieve high-fidelity, full-cycle reconstructions from casual RGB-D input.

Abstract

Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion. To tackle this challenge, we introduce MorpheuS, a framework for dynamic 360° surface reconstruction from a casually captured RGB-D video. Our approach models the target scene as a canonical field that encodes its geometry and appearance, in conjunction with a deformation field that warps points from the current frame to the canonical space. We leverage a view-dependent diffusion prior and distill knowledge from it to achieve realistic completion of unobserved regions. Experimental results on various real-world and synthetic datasets show that our method can achieve high-fidelity 360° surface reconstruction of a deformable object from a monocular RGB-D video.
Paper Structure (11 sections, 14 equations, 9 figures, 3 tables)

This paper contains 11 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: We propose MorpheuS, a dynamic scene reconstruction method that leverages neural implicit representations and diffusion priors for achieving 360$\degree{}$ reconstruction of a moving object from a monocular RGB-D video. Our approach can achieve both metrically accurate reconstruction of the observed regions and photo-realistic completion of unobserved regions of a dynamic scene.
  • Figure 2: Overview of MorpheuS. 1) Dynamic surface rendering: we model the target dynamic scene via a deformation field that maps a point from observation space to a hyper-dimensional canonical space and a canonical field that decodes the point into SDF and color. 2) Diffusion prior: we leverage a diffusion prior and perform SDS to complete the unobserved region. Note here the de-noising process is in latent space. All visualization is generated via decoding the latent vector for illustration purposes. 3) Optimization: We optimize the scene representation using real view supervision ${\mathcal{L}}_{\mathrm{real}}$, SDS loss ${\mathcal{L}}_{\mathrm{S}}$, canonical regularization ${\mathcal{L}}_{\mathrm{reg}}^{\mathrm{cano}}$, and parameter regularization ${\mathcal{L}}_{\mathrm{reg}}^{\mathrm{param}}$.
  • Figure 3: Real-world dataset reconstruction results (From left to right, top to bottom: Frog, Teddy, Human2, and Mochi). NDR Cai2022NDR achieves high-quality surface reconstruction in the observed region but fails to produce photo-realistic completion, resulting in spurious surfaces in unobserved regions. In contrast, our method can produce high-quality 360° surface reconstruction.
  • Figure 4: Qualitative results of novel view synthesis. Please refer to suppl. material for more quantitative & qualitative comparisons.
  • Figure 5: Ablation study on geometric initialization. Geometric initialization with fixed blob function poole2022dreamfusion fails here.
  • ...and 4 more figures