Distilling 3D distinctive local descriptors for 6D pose estimation
Amir Hamza, Andrea Caraffa, Davide Boscaini, Fabio Poiesi
TL;DR
Zero-shot 6D pose estimation relies on powerful 3D local descriptors but faces practical constraints due to GeDi's slow inference. The authors introduce dGeDi, an object-oriented distillation framework that regresses GeDi descriptors with a fast PointTransformerV3-based student, guided by learning via correspondences from a frozen GeDi teacher. A novel loss that downweights unreliable supervision and a scalable training strategy enable large-scale synthetic data use while preserving discriminative descriptor quality, achieving substantial runtime reductions (over 170x faster) with competitive accuracy. This work moves zero-shot 6D pose estimation closer to real-time feasibility and opens avenues for efficient geometric reasoning in robotics.
Abstract
Three-dimensional local descriptors are crucial for encoding geometric surface properties, making them essential for various point cloud understanding tasks. Among these descriptors, GeDi has demonstrated strong zero-shot 6D pose estimation capabilities but remains computationally impractical for real-world applications due to its expensive inference process. Can we retain GeDi's effectiveness while significantly improving its efficiency? In this paper, we explore this question by introducing a knowledge distillation framework that trains an efficient student model to regress local descriptors from a GeDi teacher. Our key contributions include: an efficient large-scale training procedure that ensures robustness to occlusions and partial observations while operating under compute and storage constraints, and a novel loss formulation that handles weak supervision from non-distinctive teacher descriptors. We validate our approach on five BOP Benchmark datasets and demonstrate a significant reduction in inference time while maintaining competitive performance with existing methods, bringing zero-shot 6D pose estimation closer to real-time feasibility. Project Website: https://tev-fbk.github.io/dGeDi/
