Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects

Rita Laezza; Mohammadreza Shetab-Bushehri; Gabriel Arslan Waltersson; Erol Özgür; Youcef Mezouar; Yiannis Karayiannidis

Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects

Rita Laezza, Mohammadreza Shetab-Bushehri, Gabriel Arslan Waltersson, Erol Özgür, Youcef Mezouar, Yiannis Karayiannidis

TL;DR

This work investigates an alternative approach, using offline RL to solve a planar shape control problem of a Deformable Linear Object (DLO), and shows that the proposed approach is able to outperform a shape-servoing baseline in a curvature inversion experiment.

Abstract

Deformable objects present several challenges to the field of robotic manipulation. One of the tasks that best encapsulates the difficulties arising due to non-rigid behavior is shape control, which requires driving an object to a desired shape. While shape-servoing methods have been shown successful in contexts with approximately linear behavior, they can fail in tasks with more complex dynamics. We investigate an alternative approach, using offline RL to solve a planar shape control problem of a Deformable Linear Object (DLO). To evaluate the effect of material properties, two DLOs are tested namely a soft rope and an elastic cord. We frame this task as a goal-conditioned offline RL problem, and aim to learn to generalize to unseen goal shapes. Data collection and augmentation procedures are proposed to limit the amount of experimental data which needs to be collected with the real robot. We evaluate the amount of augmentation needed to achieve the best results, and test the effect of regularization through behavior cloning on the TD3+BC algorithm. Finally, we show that the proposed approach is able to outperform a shape-servoing baseline in a curvature inversion experiment.

Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects

TL;DR

Abstract

Paper Structure (20 sections, 4 equations, 8 figures, 1 table)

This paper contains 20 sections, 4 equations, 8 figures, 1 table.

INTRODUCTION
RELATED WORK
PROBLEM STATEMENT
METHODS
DLO Tracking
Controller
Offline Goal-Conditioned Reinforcement Learning
Data Collection
Data Augmentation
EXPERIMENTS
Experimental Setup
MDP Formulation
Datasets
Training Procedure
Testing Procedure
...and 5 more sections

Figures (8)

Figure 1: Dual-arm ABB YuMi robot manipulating a DLO on a table. A fixed Intel RealSense camera provides a top-view of the workspace, with the field of view shown in the top right corner. The DLO state is shown a set of points, with the current shape tracking in green and the goal shape in red.
Figure 2: Block diagram showing system architecture. Blue blocks indicate hardware components, i.e. the camera and the robot. The green block is the RL Policy modeled as an ANN, which is trained using the proposed method. Red indicates the user input, namely a desired shape of the DLO. Finally, gray blocks constitute the other software components for perception i.e. DLO Tracking, and control, i.e. Trajectory Generation, Forward Kinematics and Controller. A PD velocity controller is used to track the generated end-effector velocity trajectories, based on a given RL policy action. If a new action is received before the previous is completed, the new trajectories use the current end-effector poses and velocities as a starting point. Joint velocities are computed via an inverse kinematics formulation, solved through HQP optimization.
Figure 3: Overlay of three possible DLO shapes with identical gripper poses. One can intuitively picture that for the top and bottom shapes, an additional top-down translation motion was necessary, while for the middle shape a counterclockwise rotation of the grippers must have occurred.
Figure 4: Flowchart of DLO tracking procedure. A RGB-D camera provides the RGB feed shown at the top, but also the point cloud of the scene. The tracking procedure has the following sequence: (1) The RGB image is segmented based on the HSV value of the DLO; (2) the reference point cloud (green) is aligned with the point cloud of the DLO (yellow), and the surrounding lattice (pink) is constrained to the reference point cloud; (3) tracking begins by iteratively finding correspondences between the point cloud of the DLO and the reference, which in turn deforms the lattice using the ARAP principle; finally, (4) the points in the lattice are used to compute a set of N points along the DLO.
Figure 5: Illustration of the safe workspace $\mathcal{W}_k$ in gray. The workspace of each end-effector is smaller than the field of view of the camera to prevent the DLO from leaving the visible region. Further, each end-effector stays in a separate region to prevent the DLO from becoming entangled. This also helps prevent end-effector collisions, making it unnecessary to include additional inequality constraints in the HQP formulation.
...and 3 more figures

Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects

TL;DR

Abstract

Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects

Authors

TL;DR

Abstract

Table of Contents

Figures (8)