Table of Contents
Fetching ...

Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics

Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo J. Taylor, Michael Posa

TL;DR

Vysics tackles occlusion in robotic object modeling by unifying vision-based geometry and contact-driven dynamics. It leverages BundleSDF for visible geometry and trajectory-driven PLL for physible geometry, jointly optimizing an $SDF$ representation and inertia to produce a complete object model without priors or tactile sensors. A mutual supervision scheme enables geometry to be inferred from both visual observations and contact interactions, exporting a URDF/mesh suitable for simulation. Experiments on occluded RGBD data show improved geometric accuracy and more reliable dynamics predictions compared with vision-only baselines, highlighting the value of physics as a prior for shape reconstruction in low-data regimes.

Abstract

We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. Project page: https://vysics-vision-and-physics.github.io/

Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics

TL;DR

Vysics tackles occlusion in robotic object modeling by unifying vision-based geometry and contact-driven dynamics. It leverages BundleSDF for visible geometry and trajectory-driven PLL for physible geometry, jointly optimizing an representation and inertia to produce a complete object model without priors or tactile sensors. A mutual supervision scheme enables geometry to be inferred from both visual observations and contact interactions, exporting a URDF/mesh suitable for simulation. Experiments on occluded RGBD data show improved geometric accuracy and more reliable dynamics predictions compared with vision-only baselines, highlighting the value of physics as a prior for shape reconstruction in low-data regimes.

Abstract

We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. Project page: https://vysics-vision-and-physics.github.io/

Paper Structure

This paper contains 33 sections, 12 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Vision-based shape reconstruction (projection shown in green) can be limited by occlusion. Fusing vision and contact-rich physics, our method recovers the occluded geometry through object interactions with the robot and environment. The robot end effector in yellow shows the robot-object interaction.
  • Figure 2: A 2D depiction of the physical meaning of a DSF \ref{['eqn:dsf']} and its implication on the SDF \ref{['eqn:sdf']}. Shades of green points have exact SDF values and are subject to \ref{['eqn:loss_support_point']}, and the orange example point $\mathbf{q}$'s signed distance can be lower-bounded by the supporting hyperplane as in \ref{['eqn:loss_hyperplane']}.
  • Figure 3: Detailed Vysics diagram. Blue arrows denote the vision-based information flow through BundleSDF wen2023bundlesdf, and green arrows for PLL bianchini2023simultaneouspfrommer2020contactnets. Purple arrows indicate the unifying connections Vysics makes to factor both vision and contact-rich physics into the geometry learning problem.
  • Figure 4: Visualization of the loss functions as the incorporation of vision and contact dynamics. Blue represents the geometry learned from vision, and green represents the geometry learned from contact dynamics.
  • Figure 5: The 7 objects and their names in our dataset.
  • ...and 7 more figures