Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception
Sebastian Jung, Leonard Klüpfel, Rudolph Triebel, Maximilian Durner
TL;DR
The paper tackles enabling reliable perception of unseen objects from few RGB template views without camera calibration or retraining. It introduces NeMO, a geometry-aware object-centric representation that encodes object geometry as a sparse 3D point cloud derived from a learned unsigned distance function and decouples object information from network weights, enabling a single network to perform detection, segmentation, and 6DoF pose estimation across model-free and model-based settings. The encoder–decoder architecture, trained with a combination of geometric and dense-prediction losses, supports offline precomputation and scalable multi-view fusion, achieving competitive or state-of-the-art results on BOP benchmarks and providing qualitative surface reconstruction capabilities. The approach is complemented by a synthetic, object-centric dataset and extensive supplementary analyses, highlighting its potential for quick object onboarding and robust generalization to novel instances without retraining.
Abstract
We present Neural Memory Object (NeMO), a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB images. Our method consists of an encoder that requires only a few RGB template views depicting an object to generate a sparse object-like point cloud using a learned UDF containing semantic and geometric information. Next, a decoder takes the object encoding together with a query image to generate a variety of dense predictions. Through extensive experiments, we show that our method can be used for few-shot object perception without requiring any camera-specific parameters or retraining on target data. Our proposed concept of outsourcing object information in a NeMO and using a single network for multiple perception tasks enhances interaction with novel objects, improving scalability and efficiency by enabling quick object onboarding without retraining or extensive pre-processing. We report competitive and state-of-the-art results on various datasets and perception tasks of the BOP benchmark, demonstrating the versatility of our approach. https://github.com/DLR-RM/nemo
