Fit-NGP: Fitting Object Models to Neural Graphics Primitives
Marwan Taher, Ignacio Alzugaray, Andrew J. Davison
TL;DR
Fit-NGP introduces a fully automatic RGB-only pipeline for precise 6-DoF pose estimation of known 3D object models by leveraging the density field produced by Instant-NGP as an intermediate representation of the scene. A multi-hypothesis optimization aligns CAD or reconstructed object models to the Instant-NGP density field, using both surface and normal-derived points to define a differentiable fitness objective, and refining poses with AdamW. The approach achieves millimetre-level translation accuracy and a few degrees of rotation on small, reflective objects within roughly two minutes, and scales to multiple objects in a scene. This work demonstrates the viability of neural density fields as practical intermediates for high-precision robotic manipulation with a single RGB camera, offering robustness to lighting and material challenges while remaining automatic and reproducible.
Abstract
Accurate 3D object pose estimation is key to enabling many robotic applications that involve challenging object interactions. In this work, we show that the density field created by a state-of-the-art efficient radiance field reconstruction method is suitable for highly accurate and robust pose estimation for objects with known 3D models, even when they are very small and with challenging reflective surfaces. We present a fully automatic object pose estimation system based on a robot arm with a single wrist-mounted camera, which can scan a scene from scratch, detect and estimate the 6-Degrees of Freedom (DoF) poses of multiple objects within a couple of minutes of operation. Small objects such as bolts and nuts are estimated with accuracy on order of 1mm.
