3D Affordance Keypoint Detection for Robotic Manipulation
Zhiyang Liu, Ruiteng Zhao, Lei Zhou, Chengran Yuan, Yuwei Wu, Sheng Guo, Zhengshen Zhang, Chenchen Liu, Marcelo H Ang
TL;DR
The paper tackles affordance-informed robotic manipulation by introducing 3D keypoints that specify where and how to interact with object parts. It presents FA KP-Net, a fusion-based network that jointly predicts per-point affordance labels and four 3D keypoints per affordance region, using RGB-D data and mean-shift clustering to obtain robust execution cues. On the UMDKP dataset and real-world tests with unseen objects, FA KP-Net achieves state-of-the-art segmentation and keypoint detection, with significant improvements in PCK3D and NMSE metrics, enabling reliable manipulation. The work demonstrates practical impact by enabling more reliable affordance-guided actions in manipulation tasks and outlines concrete avenues for extending to more complex environments.
Abstract
This paper presents a novel approach for affordance-informed robotic manipulation by introducing 3D keypoints to enhance the understanding of object parts' functionality. The proposed approach provides direct information about what the potential use of objects is, as well as guidance on where and how a manipulator should engage, whereas conventional methods treat affordance detection as a semantic segmentation task, focusing solely on answering the what question. To address this gap, we propose a Fusion-based Affordance Keypoint Network (FAKP-Net) by introducing 3D keypoint quadruplet that harnesses the synergistic potential of RGB and Depth image to provide information on execution position, direction, and extent. Benchmark testing demonstrates that FAKP-Net outperforms existing models by significant margins in affordance segmentation task and keypoint detection task. Real-world experiments also showcase the reliability of our method in accomplishing manipulation tasks with previously unseen objects.
