Table of Contents
Fetching ...

Sim-to-Real Reinforcement Learning for Deformable Object Manipulation

Jan Matas, Stephen James, Andrew J. Davison

TL;DR

This work extends end-to-end reinforcement learning to deformable object manipulation by training in simulation with domain randomisation and transferring to the real world without prior deformable exposure. It builds on DDPG and multiple enhancements (prioritised replay, N-step returns, demonstrations via DDPGfD, BC, reset-to-demo, TD3, asymmetric actor-critic) and introduces a cloth-focused simulation environment with anchor-based grasps. The approach achieves high sim performance and meaningful real-world success across three tasks, while analysing ablations to identify key contributors and limitations—particularly grasp reliability and sensitivity to camera setup. The study highlights the need for better deformable-object support in simulators to enable broader benchmarking and progress in real-world cloth manipulation.

Abstract

We have seen much recent progress in rigid object manipulation, but interaction with deformable objects has notably lagged behind. Due to the large configuration space of deformable objects, solutions using traditional modelling approaches require significant engineering work. Perhaps then, bypassing the need for explicit modelling and instead learning the control in an end-to-end manner serves as a better approach? Despite the growing interest in the use of end-to-end robot learning approaches, only a small amount of work has focused on their applicability to deformable object manipulation. Moreover, due to the large amount of data needed to learn these end-to-end solutions, an emerging trend is to learn control policies in simulation and then transfer them over to the real world. To-date, no work has explored whether it is possible to learn and transfer deformable object policies. We believe that if sim-to-real methods are to be employed further, then it should be possible to learn to interact with a wide variety of objects, and not only rigid objects. In this work, we use a combination of state-of-the-art deep reinforcement learning algorithms to solve the problem of manipulating deformable objects (specifically cloth). We evaluate our approach on three tasks --- folding a towel up to a mark, folding a face towel diagonally, and draping a piece of cloth over a hanger. Our agents are fully trained in simulation with domain randomisation, and then successfully deployed in the real world without having seen any real deformable objects.

Sim-to-Real Reinforcement Learning for Deformable Object Manipulation

TL;DR

This work extends end-to-end reinforcement learning to deformable object manipulation by training in simulation with domain randomisation and transferring to the real world without prior deformable exposure. It builds on DDPG and multiple enhancements (prioritised replay, N-step returns, demonstrations via DDPGfD, BC, reset-to-demo, TD3, asymmetric actor-critic) and introduces a cloth-focused simulation environment with anchor-based grasps. The approach achieves high sim performance and meaningful real-world success across three tasks, while analysing ablations to identify key contributors and limitations—particularly grasp reliability and sensitivity to camera setup. The study highlights the need for better deformable-object support in simulators to enable broader benchmarking and progress in real-world cloth manipulation.

Abstract

We have seen much recent progress in rigid object manipulation, but interaction with deformable objects has notably lagged behind. Due to the large configuration space of deformable objects, solutions using traditional modelling approaches require significant engineering work. Perhaps then, bypassing the need for explicit modelling and instead learning the control in an end-to-end manner serves as a better approach? Despite the growing interest in the use of end-to-end robot learning approaches, only a small amount of work has focused on their applicability to deformable object manipulation. Moreover, due to the large amount of data needed to learn these end-to-end solutions, an emerging trend is to learn control policies in simulation and then transfer them over to the real world. To-date, no work has explored whether it is possible to learn and transfer deformable object policies. We believe that if sim-to-real methods are to be employed further, then it should be possible to learn to interact with a wide variety of objects, and not only rigid objects. In this work, we use a combination of state-of-the-art deep reinforcement learning algorithms to solve the problem of manipulating deformable objects (specifically cloth). We evaluate our approach on three tasks --- folding a towel up to a mark, folding a face towel diagonally, and draping a piece of cloth over a hanger. Our agents are fully trained in simulation with domain randomisation, and then successfully deployed in the real world without having seen any real deformable objects.

Paper Structure

This paper contains 18 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: We learn robot policies in simulation and test them in the real-world. The algorithm was evaluated on 3 different tasks: folding a large towel up to a tape (top row), hanging a small towel on a hanger (middle row) and diagonally folding a square piece of cloth (bottom row).
  • Figure 2: Examples of domain randomisation for the hanger environment. During randomisation, we vary the table textures, cloth and arm colours, light position, camera position and orientation, cloth size and position, hanger size and position, initial arm position and size of arm base.
  • Figure 3: The network architecture uses 3 different inputs --- RGB images from the camera looking at the scene, joint angles and gripper position (available at test time from the robot API) and full state, which is only available at training time. The top half of the figure corresponds to the actor, while the bottom half corresponds to twin critics. The actor receives joint angles, gripper position and RGB images while the critic receives full-low dimensional state. Auxiliary outputs of the actor are only used during training to help the network quickly recognise essential scene features. However, they were also useful for debugging purposes at test time, because we can plot the estimate of cloth position and target position to verify that the actor understands the scene.
  • Figure 4: Ablation studies on the Diagonal Folding task, where "Ours" shows the result of the algorithm with all improvements. The reward for success was set to be 100, and therefore it is equal to the percentage of successes. Two evaluation episodes were performed after each epoch. Curves report the mean of 2 random seeds, and they were smoothed to improve legibility.