Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics
Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo J. Taylor, Michael Posa
TL;DR
Vysics tackles occlusion in robotic object modeling by unifying vision-based geometry and contact-driven dynamics. It leverages BundleSDF for visible geometry and trajectory-driven PLL for physible geometry, jointly optimizing an $SDF$ representation and inertia to produce a complete object model without priors or tactile sensors. A mutual supervision scheme enables geometry to be inferred from both visual observations and contact interactions, exporting a URDF/mesh suitable for simulation. Experiments on occluded RGBD data show improved geometric accuracy and more reliable dynamics predictions compared with vision-only baselines, highlighting the value of physics as a prior for shape reconstruction in low-data regimes.
Abstract
We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. Project page: https://vysics-vision-and-physics.github.io/
