Table of Contents
Fetching ...

DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Bao Thach, Brian Y. Cho, Shing-Hei Ho, Tucker Hermans, Alan Kuntz

TL;DR

This work tackles the challenge of controlling the 3D shape of deformable objects using partial-view sensing. It introduces DeformerNet, a two-stage learning framework that embeds current and target shapes into a latent space and predicts end-effector actions, with a dense manipulation-point predictor to select grasp locations and an extended architecture that supports bimanual manipulation. Through extensive simulation and real-robot experiments, including ex vivo tissue, it demonstrates generalization to unseen geometries and stiffness, outperforming baseline planners and model-free RL, and enabling practical surgery-inspired tasks such as retraction, tissue wrapping, and tube connecting. The approach offers a data-driven, closed-loop solution for deformable-object shape servoing with potential impact across home, industrial, and surgical robotics.

Abstract

Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to iteratively deform the object toward the target shape. We demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training, including ex vivo chicken muscle tissue. Crucially, using DeformerNet, the robot successfully accomplishes three surgical sub-tasks: retraction (moving tissue aside to access a site underneath it), tissue wrapping (a sub-task in procedures like aortic stent placements), and connecting two tubular pieces of tissue (a sub-task in anastomosis).

DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

TL;DR

This work tackles the challenge of controlling the 3D shape of deformable objects using partial-view sensing. It introduces DeformerNet, a two-stage learning framework that embeds current and target shapes into a latent space and predicts end-effector actions, with a dense manipulation-point predictor to select grasp locations and an extended architecture that supports bimanual manipulation. Through extensive simulation and real-robot experiments, including ex vivo tissue, it demonstrates generalization to unseen geometries and stiffness, outperforming baseline planners and model-free RL, and enabling practical surgery-inspired tasks such as retraction, tissue wrapping, and tube connecting. The approach offers a data-driven, closed-loop solution for deformable-object shape servoing with potential impact across home, industrial, and surgical robotics.

Abstract

Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to iteratively deform the object toward the target shape. We demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training, including ex vivo chicken muscle tissue. Crucially, using DeformerNet, the robot successfully accomplishes three surgical sub-tasks: retraction (moving tissue aside to access a site underneath it), tissue wrapping (a sub-task in procedures like aortic stent placements), and connecting two tubular pieces of tissue (a sub-task in anastomosis).
Paper Structure (23 sections, 2 equations, 27 figures)

This paper contains 23 sections, 2 equations, 27 figures.

Figures (27)

  • Figure 1: (Top) Overview of our shape-servoing-based 3D deformable object manipulation framework. Our pipeline takes as inputs the current point cloud ($\mathcal{P}_\mathrm{c}$) of the deformable object as well as a goal point cloud ($\mathcal{P}_\mathrm{g}$). It then predicts where on the object the robot should grasp, i.e. manipulation point(s) $\mathbf{p}_\mathrm{m}$ (Sec \ref{['sec:mani_point']}). Having grasped the object, the robot leverages our neural network $\emph{DeformerNet}{}$ to compute an action that drives the object toward the goal shape (Sec \ref{['sec:deformernet_details']}). After successfully executing the action, the robot senses the current point cloud and feeds it back to $\emph{DeformerNet}{}$ to close the control loop. (Bottom) An example manipulation sequence of a soft pillow-like object using our framework.
  • Figure 2: Importance of manipulation point selection. Leftmost: goal shape; Red box: successful manipulation point; Blue box: failed manipulation point.
  • Figure 3: (Top) Architecture of DeformerNet. Bounded by the dotted blue box is the feature extraction stage, and bounded by the dotted red box is the deformation control stage. The feature extraction stage takes as inputs the current point cloud ($\mathcal{P}_\mathrm{c}$) as well as the goal point cloud ($\mathcal{P}_\mathrm{g}$), passes each of them through its corresponding feature extraction module, and eventually produces two 256-dimensional vectors. These vectors are concatenated together to compose a final 512-dimensional shape feature vector. The deformation control stage takes in this shape feature vector, passes it through a series of fully-connected layers, and finally outputs an action that drives the object toward the goal shape. (Bottom) Architecture of the feature extraction module. It consists of three sequential PointConv PointConv2019 convolutional layers. The feature extractor $g_g$ of $\mathcal{P}_\mathrm{g}$ only takes the point cloud as input. For the current point cloud $\mathcal{P}_\mathrm{c}$, $g_c$ takes in additionally the two manipulation point positions.
  • Figure 4: Architecture of the dense predictor network, our manipulation point selection method. We use the feature extraction module of $\emph{DeformerNet}$ as the encoder, and the feature propagation (FP) module of PointConv PointConv2019 as the decoder. The outputs of the encoder-decoder series are concatenated together to form a feature point cloud of shape $128\times P$, then passed through a series of 1D Convolution layers to output two vectors of shape $1\times P$, and finally normalized to 0 to 1. The two manipulation points are then defined as those with the highest value in each vector. The red and blue spheres represent where the dense predictor predicts to be the best manipulation points for the two robot arms.
  • Figure 5: (Top left) Simulation setup for single-arm DeformerNet experiments, showing a patient-side manipulator of the dVRK in Issac gym. (Top right) Simulation setup in Issac gym for dual-arm DeformerNet experiments. (Bottom left) We train and test on a diverse set of object geometries. Here we provide some sample objects from the training dataset. Leftmost are the two box objects with an aspect ratio of 1 and 3, respectively. In the middle are the two cylinder objects with an aspect ratio of 3 and 8, respectively. Rightmost are the two hemi-ellipsoid objects with an aspect ratio of 1 and 4, respectively. (Bottom right) We also challenge our method with manipulating chicken muscle tissue, an object with complex geometry that was unseen during training and outside of the training set distribution.
  • ...and 22 more figures