Table of Contents
Fetching ...

ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu

TL;DR

This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks, demonstrating notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer, setting new standards for POM research.

Abstract

Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather than the handle. Despite its importance, research in POM skills remains limited, because learning manipulation skills requires pose-varying simulation environments and datasets. This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks. ManiPose encompasses: 1) Simulation environments for POM feature tasks ranging from 6D pose-specific pick-and-place of single objects to cluttered scenes, further including interactions with articulated objects. 2) A comprehensive dataset featuring geometrically consistent and manipulation-oriented 6D pose labels for 2936 real-world scanned rigid objects and 100 articulated objects across 59 categories. 3) A baseline for POM, leveraging the inferencing abilities of LLM (e.g., ChatGPT) to analyze the relationship between 6D pose and task-specific requirements, offers enhanced pose-aware grasp prediction and motion planning capabilities. Our benchmark demonstrates notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer, setting new standards for POM research. We will open-source the ManiPose benchmark with the final version paper, inviting the community to engage with our resources, available at our website:https://sites.google.com/view/manipose.

ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

TL;DR

This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks, demonstrating notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer, setting new standards for POM research.

Abstract

Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather than the handle. Despite its importance, research in POM skills remains limited, because learning manipulation skills requires pose-varying simulation environments and datasets. This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks. ManiPose encompasses: 1) Simulation environments for POM feature tasks ranging from 6D pose-specific pick-and-place of single objects to cluttered scenes, further including interactions with articulated objects. 2) A comprehensive dataset featuring geometrically consistent and manipulation-oriented 6D pose labels for 2936 real-world scanned rigid objects and 100 articulated objects across 59 categories. 3) A baseline for POM, leveraging the inferencing abilities of LLM (e.g., ChatGPT) to analyze the relationship between 6D pose and task-specific requirements, offers enhanced pose-aware grasp prediction and motion planning capabilities. Our benchmark demonstrates notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer, setting new standards for POM research. We will open-source the ManiPose benchmark with the final version paper, inviting the community to engage with our resources, available at our website:https://sites.google.com/view/manipose.
Paper Structure (17 sections, 7 figures, 1 table)

This paper contains 17 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: Illustration of the influence of object poses for manipulation.
  • Figure 2: Pose-aware manipulation environments in ManiPose benchmark. (a) Single object with pose variation: Pick an object on the table and place it to a target pose. The initial and target poses (positions and orientations) are randomly selected. (b) Multi objects in cluttered scene: a pile of objects are place on the table. Pick one or more objects and place to target poses. (c) Articulated object interaction: Put in or take out objects to the cabinet drawer or door. The robot arm needs to open and close the articulated object by the handle and consider the relative poses of objects inside the cabinet.
  • Figure 3: Object pose type-level alignment based on objects' geometry and functions. X, Y, and Z axes are represented in red, green, and blue. In the drawer and microwave, the joint axis is represented in cyan. Axial symmetric objects: X-axis is the axis of symmetry. Mirror symmetric objects: X-Y plane is the symmetric plane, and X-Z plane could be the second symmetric plane along a longer length. Functional objects: have functional and gripping areas. X-axis is the long axis direction; Y-axis is the grasp approaching direction.
  • Figure 4: Object pose labeling. Original: mug pose labeling in OmniObject3D wu2023omniobject3d, YCB calli2015ycb, and PACE you2023pace datasets. ManiPose: unified mug pose labeling in ManiPose dataset. X,Y, and Z axes are represented in red, green and blue.
  • Figure 5: Pose-aware object manipulation baseline. (a) Pose-invariant grasp pose prediction: generate grasp pose (GP) candidates by converting objects to pose-invariant base coordinate. (b) Action primitive planning: plan trajectory with action primitives: Move to, Grasp at, and Release. (c) Execute planned grasp poses and action primitives at each step.
  • ...and 2 more figures