Table of Contents
Fetching ...

3D Affordance Keypoint Detection for Robotic Manipulation

Zhiyang Liu, Ruiteng Zhao, Lei Zhou, Chengran Yuan, Yuwei Wu, Sheng Guo, Zhengshen Zhang, Chenchen Liu, Marcelo H Ang

TL;DR

The paper tackles affordance-informed robotic manipulation by introducing 3D keypoints that specify where and how to interact with object parts. It presents FA KP-Net, a fusion-based network that jointly predicts per-point affordance labels and four 3D keypoints per affordance region, using RGB-D data and mean-shift clustering to obtain robust execution cues. On the UMDKP dataset and real-world tests with unseen objects, FA KP-Net achieves state-of-the-art segmentation and keypoint detection, with significant improvements in PCK3D and NMSE metrics, enabling reliable manipulation. The work demonstrates practical impact by enabling more reliable affordance-guided actions in manipulation tasks and outlines concrete avenues for extending to more complex environments.

Abstract

This paper presents a novel approach for affordance-informed robotic manipulation by introducing 3D keypoints to enhance the understanding of object parts' functionality. The proposed approach provides direct information about what the potential use of objects is, as well as guidance on where and how a manipulator should engage, whereas conventional methods treat affordance detection as a semantic segmentation task, focusing solely on answering the what question. To address this gap, we propose a Fusion-based Affordance Keypoint Network (FAKP-Net) by introducing 3D keypoint quadruplet that harnesses the synergistic potential of RGB and Depth image to provide information on execution position, direction, and extent. Benchmark testing demonstrates that FAKP-Net outperforms existing models by significant margins in affordance segmentation task and keypoint detection task. Real-world experiments also showcase the reliability of our method in accomplishing manipulation tasks with previously unseen objects.

3D Affordance Keypoint Detection for Robotic Manipulation

TL;DR

The paper tackles affordance-informed robotic manipulation by introducing 3D keypoints that specify where and how to interact with object parts. It presents FA KP-Net, a fusion-based network that jointly predicts per-point affordance labels and four 3D keypoints per affordance region, using RGB-D data and mean-shift clustering to obtain robust execution cues. On the UMDKP dataset and real-world tests with unseen objects, FA KP-Net achieves state-of-the-art segmentation and keypoint detection, with significant improvements in PCK3D and NMSE metrics, enabling reliable manipulation. The work demonstrates practical impact by enabling more reliable affordance-guided actions in manipulation tasks and outlines concrete avenues for extending to more complex environments.

Abstract

This paper presents a novel approach for affordance-informed robotic manipulation by introducing 3D keypoints to enhance the understanding of object parts' functionality. The proposed approach provides direct information about what the potential use of objects is, as well as guidance on where and how a manipulator should engage, whereas conventional methods treat affordance detection as a semantic segmentation task, focusing solely on answering the what question. To address this gap, we propose a Fusion-based Affordance Keypoint Network (FAKP-Net) by introducing 3D keypoint quadruplet that harnesses the synergistic potential of RGB and Depth image to provide information on execution position, direction, and extent. Benchmark testing demonstrates that FAKP-Net outperforms existing models by significant margins in affordance segmentation task and keypoint detection task. Real-world experiments also showcase the reliability of our method in accomplishing manipulation tasks with previously unseen objects.

Paper Structure

This paper contains 8 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Affordance-informed Manipulation Pipeline via 3D Keypoints. Each affordance region is represented by four 3D keypoints predicted from RGB-D image by FAKP-Net. Keypoints are interpreted as action position, direction and extent. The manipulator completes the task with provided execution information.
  • Figure 2: Overview of FAKP-Net. The feature encoder processes an RGB-D image to extract per-point features. These features are then fed into affordance segmentation decoder and keypoints offsets decoder respectively, predicting per-point semantic labels and per-point translation offsets relative to keypoints. A clustering algorithm is then used to distinguish different affordance regions with the same semantic labels, and points within the same affordance region vote for their keypoints.
  • Figure 3: Visualization of Affordance Segmentation and 3D numbered keypoints on UMDKP test dataset. The color coding for the affordance categories is as follows: white for the background; yellow for contain; purple for w-grasp; red for grasp; cyan for pound; green for cut; blue for scoop. Four numbered 3D keypoints attach with each affordance region.
  • Figure 4: Outputs of FAKP and AffKp on previously unseen objects in real-world (objects from IKEA).
  • Figure 5: Visualization of Affordance Segmentation and 3D keypoints quadruplet on previously unseen objects in real-world (objects from IKEA).