Table of Contents
Fetching ...

Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data

Julien Merand, Boris Meden, Mathieu Grossard

TL;DR

This work tackles the problem of recovering the joint configuration $\mathcal{Q}$ of a multifingered gripper from a point cloud $\mathcal{H}$ by learning a conditional variational auto-encoder that implicitly selects valid IK solutions. The method encodes a subset of the gripper PC via PointNet, samples a latent $z$ from a prior, and decodes to $\hat{\mathcal{Q}}$, trained with an ELBO objective incorporating RMSE reconstruction and KL regularization. Evaluation on MultiDex with the Allegro Hand demonstrates sub-millisecond inference and competitive joint/Cartesian accuracy across diverse PC representations (Fully Dense, Cluster, Handprint), while analysis highlights dataset coverage and generalization considerations. The approach offers a practical, robot-centric route to integrate AI-driven configuration estimation into real-time grasp planning, with straightforward training data generation from URDF/CAD and promising avenues for extending to full hand pose estimation and optimization of hyperparameters.

Abstract

This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain, as generated by visual sensors, simulations or even generative neural networks. Well-known inverse kinematics (IK) techniques can provide mathematically exact solutions (when they exist) for joint configuration determination based solely on the fingertip pose, but often require post-hoc decision-making by considering the positions of all intermediate phalanges in the gripper's fingers, or rely on algorithms to numerically approximate solutions for more complex kinematics. In contrast, our method leverages machine learning to implicitly overcome these challenges. This is achieved through a Conditional Variational Auto-Encoder (CVAE), which takes point cloud data of key structural elements as input and reconstructs the corresponding joint configurations. We validate our approach on the MultiDex grasping dataset using the Allegro Hand, operating within 0.05 milliseconds and achieving accuracy comparable to state-of-the-art methods. This highlights the effectiveness of our pipeline for joint configuration estimation within the broader context of AI-driven techniques for grasp planning.

Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data

TL;DR

This work tackles the problem of recovering the joint configuration of a multifingered gripper from a point cloud by learning a conditional variational auto-encoder that implicitly selects valid IK solutions. The method encodes a subset of the gripper PC via PointNet, samples a latent from a prior, and decodes to , trained with an ELBO objective incorporating RMSE reconstruction and KL regularization. Evaluation on MultiDex with the Allegro Hand demonstrates sub-millisecond inference and competitive joint/Cartesian accuracy across diverse PC representations (Fully Dense, Cluster, Handprint), while analysis highlights dataset coverage and generalization considerations. The approach offers a practical, robot-centric route to integrate AI-driven configuration estimation into real-time grasp planning, with straightforward training data generation from URDF/CAD and promising avenues for extending to full hand pose estimation and optimization of hyperparameters.

Abstract

This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain, as generated by visual sensors, simulations or even generative neural networks. Well-known inverse kinematics (IK) techniques can provide mathematically exact solutions (when they exist) for joint configuration determination based solely on the fingertip pose, but often require post-hoc decision-making by considering the positions of all intermediate phalanges in the gripper's fingers, or rely on algorithms to numerically approximate solutions for more complex kinematics. In contrast, our method leverages machine learning to implicitly overcome these challenges. This is achieved through a Conditional Variational Auto-Encoder (CVAE), which takes point cloud data of key structural elements as input and reconstructs the corresponding joint configurations. We validate our approach on the MultiDex grasping dataset using the Allegro Hand, operating within 0.05 milliseconds and achieving accuracy comparable to state-of-the-art methods. This highlights the effectiveness of our pipeline for joint configuration estimation within the broader context of AI-driven techniques for grasp planning.

Paper Structure

This paper contains 9 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Given a gripper point cloud $\mathcal{H}$ or a set of contact points $\mathcal{P} \subset \mathcal{H}$, our method reconstructs the joint configuration $\mathcal{Q}$. We evaluate this approach in a grasping context using the MultiDex dataset li2023gendexgrasp.
  • Figure 2: Overview of our method: The left panel illustrates the dataset generation process. Given a gripper URDF model, three datasets of PCs (Fully Dense PC, Handprint PC, Cluster PC) are created as described in Section \ref{['subsec:dataset_generation']}, each associated with its joint configuration ($\mathcal{Q}$). The right panel illustrates the model architecture, where a CVAE generates a joint configuration ($\mathcal{Q}$) from a subset ($\mathcal{S}$) of the gripper PC ($\mathcal{H}$). The architecture includes an encoding step that processes the PC using a PointNet, and the joint configuration with a fully connected MLP. These encoded representations are then concatenated and further processed by a latent encoder (E). The decoding step (D) involves processing the encoded PC with a sample from the latent space $z$ (characterized by mean $\mu$ and variance $\sigma$) to reconstruct the joint configuration $\hat{\mathcal{Q}}$ using a fully connected MLP.
  • Figure 3: Evolution of $\beta$ over 250 epochs.
  • Figure 4: Joint Distribution of the Allegro Hand for both datasets. Top: Fully Dense PC, Bottom: MultiDex dataset li2023gendexgrasp.
  • Figure 5: Workspace analysis of the generic and thumb fingers.