EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy
Haodi Yao, Fenghua He, Ning Hao, Chen Xie
TL;DR
EdgePoint2 tackles the challenge of real-time, accurate keypoint detection and description on resource-constrained edge devices by separating a compact feature encoder from detection and description heads and, crucially, by a descriptor-distillation framework that preserves embedding structure in low dimensions. The core novelty lies in combining Orthogonal Procrustes loss with a similarity loss to distill teacher descriptors into compact student descriptors across dimensions, enabling $32$/$48$/$64$-dimensional representations without sacrificing SOTA performance. The authors offer $14$ sub-models, validate across multiple benchmarks (HPatches, MegaDepth/ScanNet, IMC2022, Aachen/InLoc), and demonstrate strong efficiency and robustness on both GPU-enabled edge accelerators and CPU-only devices, including real-time inference on ARM. This work advances practical deployment of dense, reliable keypoint pipelines in distributed vision systems by reducing bandwidth and computation while maintaining high matching and localization accuracy.
Abstract
The field of keypoint extraction, which is essential for vision applications like Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM), has evolved from relying on handcrafted methods to leveraging deep learning techniques. While deep learning approaches have significantly improved performance, they often incur substantial computational costs, limiting their deployment in real-time edge applications. Efforts to create lightweight neural networks have seen some success, yet they often result in trade-offs between efficiency and accuracy. Additionally, the high-dimensional descriptors generated by these networks poses challenges for distributed applications requiring efficient communication and coordination, highlighting the need for compact yet competitively accurate descriptors. In this paper, we present EdgePoint2, a series of lightweight keypoint detection and description neural networks specifically tailored for edge computing applications on embedded system. The network architecture is optimized for efficiency without sacrificing accuracy. To train compact descriptors, we introduce a combination of Orthogonal Procrustes loss and similarity loss, which can serve as a general approach for hypersphere embedding distillation tasks. Additionally, we offer 14 sub-models to satisfy diverse application requirements. Our experiments demonstrate that EdgePoint2 consistently achieves state-of-the-art (SOTA) accuracy and efficiency across various challenging scenarios while employing lower-dimensional descriptors (32/48/64). Beyond its accuracy, EdgePoint2 offers significant advantages in flexibility, robustness, and versatility. Consequently, EdgePoint2 emerges as a highly competitive option for visual tasks, especially in contexts demanding adaptability to diverse computational and communication constraints.
