PixTrack: Precise 6DoF Object Pose Tracking using NeRF Templates and Feature-metric Alignment
Prajwal Chidananda, Saurabh Nair, Douglas Lee, Adrian Kaehler
TL;DR
PixTrack addresses robust 6DoF object pose tracking from monocular RGB and RGB-D by representing the target object with an object-centered NeRF and using a PixLoc-style feature-metric optimization on novel-view renderings. The method synthesizes a reference view from the previous frame’s pose, computes multi-scale feature and depth residuals, and optimizes SE(3) updates without auxiliary pose networks or annotated trajectories. Data collection of Object-NeRF is performed with a turntable protocol, and the SfM pipeline leverages COLMAP with enhancements to yield accurate object geometry, which is then used to extract a clean Object-NeRF through NeRF differencing. Experimental results on YCB-Video show improved accuracy with depth information and demonstrate jitter-free, online tracking without annotation, while maintaining efficiency through caching and occlusion-aware masking. The work offers a practical, annotation-free, multi-object tracking framework that integrates NeRF-based canonical representations with feature-metric optimization for robust 6DoF pose tracking in real-world scenes.
Abstract
We present PixTrack, a vision based object pose tracking framework using novel view synthesis and deep feature-metric alignment. We follow an SfM-based relocalization paradigm where we use a Neural Radiance Field to canonically represent the tracked object. Our evaluations demonstrate that our method produces highly accurate, robust, and jitter-free 6DoF pose estimates of objects in both monocular RGB images and RGB-D images without the need of any data annotation or trajectory smoothing. Our method is also computationally efficient making it easy to have multi-object tracking with no alteration to our algorithm through simple CPU multiprocessing. Our code is available at: https://github.com/GiantAI/pixtrack
