Table of Contents
Fetching ...

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

Minghan Zhu, Zhiyi Wang, Qihang Sun, Maani Ghaffari, Michael Posa

TL;DR

This work tackles 3D object reconstruction under heavy occlusion by fusing data-driven shape priors with sparse, physics-based contact constraints. It introduces a training-free drag-based energy to guide a flow-matching 3D generator (Amodal3R/TRELLIS) using contact points estimated from videos, reducing ambiguity in unseen geometry. Across synthetic and real-world datasets, the approach improves reconstruction accuracy over pure 3D generation and contact-only optimization, demonstrating the value of integrating priors with sparse contact information. The method offers a practical pathway to more robust robot perception by blending data-driven priors with physics-aware cues.

Abstract

Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry. We combine the two sources of information through contact-guided 3D generation. The guidance formulation is inspired by drag-based editing in generative models. Experiments on synthetic and real-world data show that our approach improves the reconstruction compared to pure 3D generation and contact-based optimization.

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

TL;DR

This work tackles 3D object reconstruction under heavy occlusion by fusing data-driven shape priors with sparse, physics-based contact constraints. It introduces a training-free drag-based energy to guide a flow-matching 3D generator (Amodal3R/TRELLIS) using contact points estimated from videos, reducing ambiguity in unseen geometry. Across synthetic and real-world datasets, the approach improves reconstruction accuracy over pure 3D generation and contact-only optimization, demonstrating the value of integrating priors with sparse contact information. The method offers a practical pathway to more robust robot perception by blending data-driven priors with physics-aware cues.

Abstract

Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry. We combine the two sources of information through contact-guided 3D generation. The guidance formulation is inspired by drag-based editing in generative models. Experiments on synthetic and real-world data show that our approach improves the reconstruction compared to pure 3D generation and contact-based optimization.

Paper Structure

This paper contains 24 sections, 3 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: This work develops a novel framework for 3D object reconstruction under occlusion by integrating data-driven 3D priors and physics-based contact information through guided generation. The two perspectives bring complementary insights that lead to high-quality and accurate 3D reconstruction.
  • Figure 2: Qualitative comparison of the geometry reconstruction. The heatmaps on the unguided and guided predictions depict the one-sided point-wise Chamfer distance from the predicted shape (refer to the error colormap). The heatmap on the ground truth (GT) mesh shows the improvement of the one-sided point-wise Chamfer distance from the GT shape (refer to the improvement colormap). The contact points are shown in yellow crossings. They are more visible in Figure \ref{['fig:contactpoints']}.
  • Figure 3: Qualitative examples of the effect of the contact-point guidance. We show the predicted meshes in two viewing angles, rotated by 90 degrees, for better 3D interpretability. The gray meshes are the ground truth for reference. The red points are the contact points with a zoomed-in visualization on the right (zoomed-in area shown in red square), in which they are connected to the nearest point on the predicted surface, which is shown in black. The yellow crossings are the other 9 contact points.
  • Figure 4: The process of converting contact points estimated by Vysics to the prediction space of Amodal3R for contact-guidance in real-world experiments. $T_V, T_A$ are the poses of objects $Q_V, Q_A$ in the camera frame. $T^A_V$ converts between the two object frames.
  • Figure 5: Examples of the RGB and normal rendering of the ground truth (GT) objects and the generated objects.
  • ...and 1 more figures