KeyMatchNet: Zero-Shot Pose Estimation in 3D Point Clouds by Generalized Keypoint Matching
Frederik Hagelskjær, Rasmus Laurvig Haugaard
TL;DR
The paper tackles zero-shot pose estimation for 3D point clouds using depth-only data, addressing industrial scenarios where color information is scarce. It introduces KeyMatchNet, a dual-branch network that computes object- and scene-features in parallel and matches object keypoints to scene points, followed by Kabsch-RANSAC for pose estimation. The authors show that precomputing object features and GPU-accelerated RANSAC yield fast runtimes while maintaining reasonable accuracy, and they validate the approach on synthetic, out-of-class, and real data, including a dataset of 1,500 CAD models in homogeneous-bin scenarios. Results indicate strong generalization to unseen objects and competitive performance with RGB-based methods when color is unavailable. This work may enable practical, low-data, zero-training deployment for industrial bin-picking and similar tasks.
Abstract
In this paper, we present KeyMatchNet, a novel network for zero-shot pose estimation in 3D point clouds. Our method uses only depth information, making it more applicable for many industrial use cases, as color information is seldom available. The network is composed of two parallel components for computing object and scene features. The features are then combined to create matches used for pose estimation. The parallel structure allows for pre-processing of the individual parts, which decreases the run-time. Using a zero-shot network allows for a very short set-up time, as it is not necessary to train models for new objects. However, as the network is not trained for the specific object, zero-shot pose estimation methods generally have lower accuracy compared with conventional methods. To address this, we reduce the complexity of the task by including the scenario information during training. This is typically not feasible as collecting real data for new tasks drastically increases the cost. However, for zero-shot pose estimation, training for new objects is not necessary and the expensive data collection can thus be performed only once. Our method is trained on 1,500 objects and is only tested on unseen objects. We demonstrate that the trained network can not only accurately estimate poses for novel objects, but also demonstrate the ability of the network on objects outside of the trained class. Test results are also shown on real data. We believe that the presented method is valuable for many real-world scenarios. Project page available at keymatchnet.github.io
