A Graph-Based Approach for Category-Agnostic Pose Estimation
Or Hirschorn, Shai Avidan
TL;DR
Traditional pose estimation models are restricted to predefined categories, limiting applicability to novel objects. This work introduces GraphCape, a graph-based approach that treats keypoints as a connected graph and uses a graph-transformer decoder to exploit geometric relations, enabling accurate pose localization for unseen categories with few support keypoints. Key contributions include (1) the GraphCape architecture with a Graph-FFN and a category-aware adjacency, (2) an updated MP-100 dataset with skeleton annotations for all categories, and (3) state-of-the-art performance in both 1-shot and 5-shot CAPE on MP-100, with improved robustness to occlusions and cross-category matching. The method advances CAPE by embedding structural priors into the decoding process, improving generalization to diverse objects and practical deployment in real-world, category-diverse scenes.
Abstract
Traditional 2D pose estimation models are limited by their category-specific design, making them suitable only for predefined object categories. This restriction becomes particularly challenging when dealing with novel objects due to the lack of relevant training data. To address this limitation, category-agnostic pose estimation (CAPE) was introduced. CAPE aims to enable keypoint localization for arbitrary object categories using a few-shot single model, requiring minimal support images with annotated keypoints. We present a significant departure from conventional CAPE techniques, which treat keypoints as isolated entities, by treating the input pose data as a graph. We leverage the inherent geometrical relations between keypoints through a graph-based network to break symmetry, preserve structure, and better handle occlusions. We validate our approach on the MP-100 benchmark, a comprehensive dataset comprising over 20,000 images spanning over 100 categories. Our solution boosts performance by 0.98% under a 1-shot setting, achieving a new state-of-the-art for CAPE. Additionally, we enhance the dataset with skeleton annotations. Our code and data are publicly available.
