Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera
Haixin Shi, Yinlin Hu, Daniel Koguciuk, Juan-Ting Lin, Mathieu Salzmann, David Ferstl
TL;DR
The paper addresses reconstructing and estimating the pose of free-moving objects from monocular RGB video without relying on priors or segmentation. It introduces a virtual camera that focuses optimization on the object center, enabling globally-consistent joint optimization of shape and pose using an implicit neural surface representation learned from the video and rendered via volume rendering. A segment-free progressive training scheme and a real-camera refinement step (PnP with RANSAC) provide robust initialization and accurate final results. Evaluations on HO3D and egocentric RGB sequences show significant improvements over prior pose-free methods and competitiveness with methods that use hand or object priors, broadening applicability to AR/VR and robotics.
Abstract
We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence globally without any segments. We progressively optimize the object shape and pose simultaneously based on an implicit neural representation. A key aspect of our method is a virtual camera system that reduces the search space of the optimization significantly. We evaluate our method on the standard HO3D dataset and a collection of egocentric RGB sequences captured with a head-mounted device. We demonstrate that our approach outperforms most methods significantly, and is on par with recent techniques that assume prior information.
