kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation
Lucas Manuelli, Wei Gao, Peter Florence, Russ Tedrake
TL;DR
This work tackles category-level robotic manipulation where object instances vary widely in shape and topology. It introduces kPAM, which uses semantic 3D keypoints as a task-aware object representation, enabling manipulation targets to be expressed as geometric costs and constraints on keypoints. The pipeline factors perception and action into instance segmentation, 3D keypoint detection, optimization-based action planning, and dense-geometry-based execution, enabling robust generalization to never-before-seen objects. Hardware experiments with shoes and mugs demonstrate centimeter-level precision and successful category-level manipulation, highlighting the approach's interpretability and practicality for real-world robotics.
Abstract
We would like robots to achieve purposeful manipulation by placing any instance from a category of objects into a desired set of goal states. Existing manipulation pipelines typically specify the desired configuration as a target 6-DOF pose and rely on explicitly estimating the pose of the manipulated objects. However, representing an object with a parameterized transformation defined on a fixed template cannot capture large intra-category shape variation, and specifying a target pose at a category level can be physically infeasible or fail to accomplish the task -- e.g. knowing the pose and size of a coffee mug relative to some canonical mug is not sufficient to successfully hang it on a rack by its handle. Hence we propose a novel formulation of category-level manipulation that uses semantic 3D keypoints as the object representation. This keypoint representation enables a simple and interpretable specification of the manipulation target as geometric costs and constraints on the keypoints, which flexibly generalizes existing pose-based manipulation methods. Using this formulation, we factor the manipulation policy into instance segmentation, 3D keypoint detection, optimization-based robot action planning and local dense-geometry-based action execution. This factorization allows us to leverage advances in these sub-problems and combine them into a general and effective perception-to-action manipulation pipeline. Our pipeline is robust to large intra-category shape variation and topology changes as the keypoint representation ignores task-irrelevant geometric details. Extensive hardware experiments demonstrate our method can reliably accomplish tasks with never-before seen objects in a category, such as placing shoes and mugs with significant shape variation into category level target configurations.
