Table of Contents
Fetching ...

Grasp Synthesis Matching From Rigid To Soft Robot Grippers Using Conditional Flow Matching

Tanisha Parulekar, Ge Shi, Josh Pinskier, David Howard, Jen Jen Chung

TL;DR

Conditional Flow Matching (CFM) is presented as a data-efficient and effective method for transferring grasp strategies, offering a scalable methodology for other soft robotic systems.

Abstract

A representation gap exists between grasp synthesis for rigid and soft grippers. Anygrasp [1] and many other grasp synthesis methods are designed for rigid parallel grippers, and adapting them to soft grippers often fails to capture their unique compliant behaviors, resulting in data-intensive and inaccurate models. To bridge this gap, this paper proposes a novel framework to map grasp poses from a rigid gripper model to a soft Fin-ray gripper. We utilize Conditional Flow Matching (CFM), a generative model, to learn this complex transformation. Our methodology includes a data collection pipeline to generate paired rigid-soft grasp poses. A U-Net autoencoder conditions the CFM model on the object's geometry from a depth image, allowing it to learn a continuous mapping from an initial Anygrasp pose to a stable Fin-ray gripper pose. We validate our approach on a 7-DOF robot, demonstrating that our CFM-generated poses achieve a higher overall success rate for seen and unseen objects (34% and 46% respectively) compared to the baseline rigid poses (6% and 25% respectively) when executed by the soft gripper. The model shows significant improvements, particularly for cylindrical (50% and 100% success for seen and unseen objects) and spherical objects (25% and 31% success for seen and unseen objects), and successfully generalizes to unseen objects. This work presents CFM as a data-efficient and effective method for transferring grasp strategies, offering a scalable methodology for other soft robotic systems.

Grasp Synthesis Matching From Rigid To Soft Robot Grippers Using Conditional Flow Matching

TL;DR

Conditional Flow Matching (CFM) is presented as a data-efficient and effective method for transferring grasp strategies, offering a scalable methodology for other soft robotic systems.

Abstract

A representation gap exists between grasp synthesis for rigid and soft grippers. Anygrasp [1] and many other grasp synthesis methods are designed for rigid parallel grippers, and adapting them to soft grippers often fails to capture their unique compliant behaviors, resulting in data-intensive and inaccurate models. To bridge this gap, this paper proposes a novel framework to map grasp poses from a rigid gripper model to a soft Fin-ray gripper. We utilize Conditional Flow Matching (CFM), a generative model, to learn this complex transformation. Our methodology includes a data collection pipeline to generate paired rigid-soft grasp poses. A U-Net autoencoder conditions the CFM model on the object's geometry from a depth image, allowing it to learn a continuous mapping from an initial Anygrasp pose to a stable Fin-ray gripper pose. We validate our approach on a 7-DOF robot, demonstrating that our CFM-generated poses achieve a higher overall success rate for seen and unseen objects (34% and 46% respectively) compared to the baseline rigid poses (6% and 25% respectively) when executed by the soft gripper. The model shows significant improvements, particularly for cylindrical (50% and 100% success for seen and unseen objects) and spherical objects (25% and 31% success for seen and unseen objects), and successfully generalizes to unseen objects. This work presents CFM as a data-efficient and effective method for transferring grasp strategies, offering a scalable methodology for other soft robotic systems.
Paper Structure (12 sections, 5 equations, 4 figures)

This paper contains 12 sections, 5 equations, 4 figures.

Figures (4)

  • Figure 1: The CFM flow model will generate a successful grasp pose for a Finray gripper, given an AnyGrasp pose that is generated on an object.
  • Figure 2: An overview of the Conditional Flow Matching (CFM) framework for grasp synthesis mapping. (a) The core concept of the CFM model that learns a continuous transformation (a "flow path") from an initial rigid gripper pose generated by Anygrasp ($\mathcal{G}_{Anygrasp}$) to a target soft gripper pose ($\mathcal{G}_{\text{CFM}}$). This transformation is guided by a learned, conditional velocity field $\mathbf{v}_\theta$. (b) This shows the architecture of the feed-forward MLP that parameterizes the velocity field $\mathbf{v}_\theta$. It takes the current grasp pose $\mathcal{G}(t_c)$, the progression parameter $t_c$, and the scene condition vector $\mathbf{c}$ as input to predict the direction of the flow. (c) This depicts the U-Net Autoencoder used to generate the condition vector $\mathbf{c}$ at the bottle-neck. It processes a raw depth image, compressing it into a latent vector that captures the object's geometry.
  • Figure 3: The experimental setup and dataset used for learning the grasp pose transformation. (a) This shows the data validation pipeline for validating the CFM model, which involves moving the robot from a home position to a pre-grasp pose derived from $\mathcal{G}_{Anygrasp}$ or $\mathcal{G}_{\text{CFM}}$, executing the grasp on the object, and lifting it. (b) This section displays examples from the training and unseen datasets for a variety of objects. Each entry includes the grasp execution, the corresponding point cloud ($\mathcal{P}$), the initial rigid gripper pose from AnyGrasp ($\mathcal{G}_{Anygrasp}$, shown in magenta), and the manually adjusted, successful soft gripper pose ($\mathcal{G}_{\text{CFM}}$, shown in green).
  • Figure 4: Benchmarking results comparing the grasp success rates of the CFM-generated soft gripper poses ($\mathcal{G}_{{soft}}$) against the original AnyGrasp poses ($\mathcal{G}_{{Anygrasp}}$).