NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen, Thibault Groueix, Yinlin Hu, Mathieu Salzmann, Vincent Lepetit
TL;DR
NOPE addresses the challenge of estimating the relative 3D pose of unseen objects from a single image without requiring a 3D model or retraining. It learns to predict average embeddings of novel views conditioned on relative pose using a U-Net with attention, and performs fast template matching over a fixed set of viewpoints to recover the pose, while also detecting ambiguities due to symmetry or occlusion. The approach demonstrates strong generalization to novel categories on ShapeNet and robust results on T-LESS, with runtime around 1 s on a single GPU and robustness to occlusions, offering a practical solution for rapid pose estimation in robotics and AR. Overall, NOPE enables one-shot pose estimation for unseen objects, identifies pose ambiguities, and delivers fast, model-free performance suitable for real-time applications.
Abstract
The practicality of 3D object pose estimation remains limited for many applications due to the need for prior knowledge of a 3D model and a training period for new objects. To address this limitation, we propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model and without requiring training time for new objects and categories. We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object. This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference. We compare our approach to state-of-the-art methods and show it outperforms them both in terms of accuracy and robustness. Our source code is publicly available at https://github.com/nv-nguyen/nope
