Table of Contents
Fetching ...

PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF

En Yen Puang, Federico Ceola, Giulia Pasquale, Lorenzo Natale

TL;DR

PCHands tackles learning a universal, cross-morphology hand pose representation for dexterous manipulation by anchoring 22 anchors $\alpha \in \mathbb{R}^{22\times3}$ and encoding them with a CVAE conditioned on manipulator identity to produce a latent $z$, which is then reduced with linear PCA to a variable-length $z'$. End-effector frame alignment across manipulators is achieved via an ICP-based refinement, enabling pose retargeting without heavy optimization post-processing. Empirically, PCHands yields faster RL convergence and robust cross-manipulator demonstration transfer, with real-world zero-shot transfer demonstrated on a 7-DoF arm. The approach supports data-efficient learning and practical deployment across diverse manipulators, highlighting its potential for scalable dexterous manipulation.

Abstract

We consider the problem of learning a common representation for dexterous manipulation across manipulators of different morphologies. To this end, we propose PCHands, a novel approach for extracting hand postural synergies from a large set of manipulators. We define a simplified and unified description format based on anchor positions for manipulators ranging from 2-finger grippers to 5-finger anthropomorphic hands. This enables learning a variable-length latent representation of the manipulator configuration and the alignment of the end-effector frame of all manipulators. We show that it is possible to extract principal components from this latent representation that is universal across manipulators of different structures and degrees of freedom. To evaluate PCHands, we use this compact representation to encode observation and action spaces of control policies for dexterous manipulation tasks learned with RL. In terms of learning efficiency and consistency, the proposed representation outperforms a baseline that learns the same tasks in joint space. We additionally show that PCHands performs robustly in RL from demonstration, when demonstrations are provided from a different manipulator. We further support our results with real-world experiments that involve a 2-finger gripper and a 4-finger anthropomorphic hand. Code and additional material are available at https://hsp-iit.github.io/PCHands/.

PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF

TL;DR

PCHands tackles learning a universal, cross-morphology hand pose representation for dexterous manipulation by anchoring 22 anchors and encoding them with a CVAE conditioned on manipulator identity to produce a latent , which is then reduced with linear PCA to a variable-length . End-effector frame alignment across manipulators is achieved via an ICP-based refinement, enabling pose retargeting without heavy optimization post-processing. Empirically, PCHands yields faster RL convergence and robust cross-manipulator demonstration transfer, with real-world zero-shot transfer demonstrated on a 7-DoF arm. The approach supports data-efficient learning and practical deployment across diverse manipulators, highlighting its potential for scalable dexterous manipulation.

Abstract

We consider the problem of learning a common representation for dexterous manipulation across manipulators of different morphologies. To this end, we propose PCHands, a novel approach for extracting hand postural synergies from a large set of manipulators. We define a simplified and unified description format based on anchor positions for manipulators ranging from 2-finger grippers to 5-finger anthropomorphic hands. This enables learning a variable-length latent representation of the manipulator configuration and the alignment of the end-effector frame of all manipulators. We show that it is possible to extract principal components from this latent representation that is universal across manipulators of different structures and degrees of freedom. To evaluate PCHands, we use this compact representation to encode observation and action spaces of control policies for dexterous manipulation tasks learned with RL. In terms of learning efficiency and consistency, the proposed representation outperforms a baseline that learns the same tasks in joint space. We additionally show that PCHands performs robustly in RL from demonstration, when demonstrations are provided from a different manipulator. We further support our results with real-world experiments that involve a 2-finger gripper and a 4-finger anthropomorphic hand. Code and additional material are available at https://hsp-iit.github.io/PCHands/.

Paper Structure

This paper contains 14 sections, 5 equations, 7 figures, 1 table, 2 algorithms.

Figures (7)

  • Figure 1: The proposed architecture consists of a cvae which encodes anchors $\alpha$ into (and decodes them from) a latent synergy space z conditioned on the one-hot manipulator identifier, and a linear pca to extract the most significant dof $z'$ in representing poses under adf. The architecture can be used to retarget manipulator poses ($j_\gamma$ to $z$ or $z'$ with encode-pass, followed by $z$ or $z'$ to $j_\nu$ with decode-pass) or to directly control manipulator $\nu$ from the common variable-length representation ($z'$ to $j_\nu$) with decode-pass.
  • Figure 2: Position of the set of 22 anchors on human and anthropomorphic hands, and 2 & 3-finger grippers. Each color-coded anchor under adf carries symbolic meaning about the region it represents consistently across manipulators.
  • Figure 3: (Top) Normalized anchor positions from manipulators with various dof. Two-finger grippers (Dark Blue) make mostly planar motions on the X-Y plane, while the others move in all three dimensions. (Bottom) The first and second principal components (PC) of the corresponding manipulator poses from vanilla pca and our method, cvae+pca. Vanilla pca clusters manipulators according to the morphology, losing representational capacity of hand synergies.
  • Figure 4: Manipulators are configured using only the first principal component $1^{\text{st}}$pc = {-3, 3}, in their own refined end-effector frame: Robotiq, WidowX, Fetch, xArm, WSG50, Rethink, Kinova2F, GoogleBot, Kinova3F, Franka, Armar, ergoCub, Schunk, Allegro, Shadow, LEAP, and MANO.
  • Figure 5: Average episode return over training of 2 rl algorithms, 3 manipulators and 5 tasks. We repeat all experiments with 3 different seeds and report the mean and standard deviation to compare PCHands (N-pc) to our baseline qin2022one.
  • ...and 2 more figures