Table of Contents
Fetching ...

Learning to unfold cloth: Scaling up world models to deformable object manipulation

Jack Rome, Stephen James, Subramanian Ramamoorthy

TL;DR

The paper tackles in-air cloth unfolding by scaling a world-model-based reinforcement learning approach. It extends DreamerV2 with depth-derived surface normals, dual-camera observations, and demonstration-driven replay-buffer augmentation to handle the complex dynamics of deformable cloth. In simulation and zero-shot real-world tests across multiple garment types, the method achieves improved unfolding performance, with an average real-world success around 74%, while highlighting practical limitations such as depth-sensor noise and single-arm manipulation. The work demonstrates the potential of geometry-focused observation spaces and demonstration-informed training to enable real-time, robust manipulation of deformable objects in service robotics contexts.

Abstract

Learning to manipulate cloth is both a paradigmatic problem for robotic research and a problem of immediate relevance to a variety of applications ranging from assistive care to the service industry. The complex physics of the deformable object makes this problem of cloth manipulation nontrivial. In order to create a general manipulation strategy that addresses a variety of shapes, sizes, fold and wrinkle patterns, in addition to the usual problems of appearance variations, it becomes important to carefully consider model structure and their implications for generalisation performance. In this paper, we present an approach to in-air cloth manipulation that uses a variation of a recently proposed reinforcement learning architecture, DreamerV2. Our implementation modifies this architecture to utilise surface normals input, in addition to modiying the replay buffer and data augmentation procedures. Taken together these modifications represent an enhancement to the world model used by the robot, addressing the physical complexity of the object being manipulated by the robot. We present evaluations both in simulation and in a zero-shot deployment of the trained policies in a physical robot setup, performing in-air unfolding of a variety of different cloth types, demonstrating the generalisation benefits of our proposed architecture.

Learning to unfold cloth: Scaling up world models to deformable object manipulation

TL;DR

The paper tackles in-air cloth unfolding by scaling a world-model-based reinforcement learning approach. It extends DreamerV2 with depth-derived surface normals, dual-camera observations, and demonstration-driven replay-buffer augmentation to handle the complex dynamics of deformable cloth. In simulation and zero-shot real-world tests across multiple garment types, the method achieves improved unfolding performance, with an average real-world success around 74%, while highlighting practical limitations such as depth-sensor noise and single-arm manipulation. The work demonstrates the potential of geometry-focused observation spaces and demonstration-informed training to enable real-time, robust manipulation of deformable objects in service robotics contexts.

Abstract

Learning to manipulate cloth is both a paradigmatic problem for robotic research and a problem of immediate relevance to a variety of applications ranging from assistive care to the service industry. The complex physics of the deformable object makes this problem of cloth manipulation nontrivial. In order to create a general manipulation strategy that addresses a variety of shapes, sizes, fold and wrinkle patterns, in addition to the usual problems of appearance variations, it becomes important to carefully consider model structure and their implications for generalisation performance. In this paper, we present an approach to in-air cloth manipulation that uses a variation of a recently proposed reinforcement learning architecture, DreamerV2. Our implementation modifies this architecture to utilise surface normals input, in addition to modiying the replay buffer and data augmentation procedures. Taken together these modifications represent an enhancement to the world model used by the robot, addressing the physical complexity of the object being manipulated by the robot. We present evaluations both in simulation and in a zero-shot deployment of the trained policies in a physical robot setup, performing in-air unfolding of a variety of different cloth types, demonstrating the generalisation benefits of our proposed architecture.
Paper Structure (13 sections, 2 equations, 5 figures, 3 tables)

This paper contains 13 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: System diagram of learning architecture with our additions in yellow. Where image state action reward trajectories are denoted as I,S,A,R and latent space, Z. Depth images are converted into surface normal images for image state, demonstration episodes are loaded into the replay buffer on start and batch random augmentation occurs while sampling images for training the world model
  • Figure 2: Pipeline of depth image to surface normals image. Higher resolution depth images output as 64x64x3 resolution normals. Same algorithm is applied in both simulation and real-world environments in real-time. The Sobel filters use a kernel size of 9 and the input/output resolution used in this process is a square 256x256
  • Figure 3: Simultaneous captures of the Franka robot unfolding a towel with views of surface normals, RGB, and depth, respectively
  • Figure 4: Example unfolds of a t-shirt garment. 2 examples of failure cases, 2 near-success cases, and 2 success cases, respectively.
  • Figure 5: Smoothed chart of normalized rewards for our model, the vanilla Dreamerv2 model, an SAC, an SAC on vector inputs, and a pick/place script. All images used are Surface-Normals. Early-stopping is triggered at 80% unfold.