Learning to unfold cloth: Scaling up world models to deformable object manipulation

Jack Rome; Stephen James; Subramanian Ramamoorthy

Learning to unfold cloth: Scaling up world models to deformable object manipulation

Jack Rome, Stephen James, Subramanian Ramamoorthy

TL;DR

The paper tackles in-air cloth unfolding by scaling a world-model-based reinforcement learning approach. It extends DreamerV2 with depth-derived surface normals, dual-camera observations, and demonstration-driven replay-buffer augmentation to handle the complex dynamics of deformable cloth. In simulation and zero-shot real-world tests across multiple garment types, the method achieves improved unfolding performance, with an average real-world success around 74%, while highlighting practical limitations such as depth-sensor noise and single-arm manipulation. The work demonstrates the potential of geometry-focused observation spaces and demonstration-informed training to enable real-time, robust manipulation of deformable objects in service robotics contexts.

Abstract

Learning to manipulate cloth is both a paradigmatic problem for robotic research and a problem of immediate relevance to a variety of applications ranging from assistive care to the service industry. The complex physics of the deformable object makes this problem of cloth manipulation nontrivial. In order to create a general manipulation strategy that addresses a variety of shapes, sizes, fold and wrinkle patterns, in addition to the usual problems of appearance variations, it becomes important to carefully consider model structure and their implications for generalisation performance. In this paper, we present an approach to in-air cloth manipulation that uses a variation of a recently proposed reinforcement learning architecture, DreamerV2. Our implementation modifies this architecture to utilise surface normals input, in addition to modiying the replay buffer and data augmentation procedures. Taken together these modifications represent an enhancement to the world model used by the robot, addressing the physical complexity of the object being manipulated by the robot. We present evaluations both in simulation and in a zero-shot deployment of the trained policies in a physical robot setup, performing in-air unfolding of a variety of different cloth types, demonstrating the generalisation benefits of our proposed architecture.

Learning to unfold cloth: Scaling up world models to deformable object manipulation

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 5 figures, 3 tables)

This paper contains 13 sections, 2 equations, 5 figures, 3 tables.

introduction
Related work
Methodology
States, Actions, and Reward Function
Model
Surface normals in place of $RGB$
Modifying the replay buffer
Results
Simulation
Benchmarking
Garment Manipulation
Real-world evaluation
Conclusions

Figures (5)

Figure 1: System diagram of learning architecture with our additions in yellow. Where image state action reward trajectories are denoted as I,S,A,R and latent space, Z. Depth images are converted into surface normal images for image state, demonstration episodes are loaded into the replay buffer on start and batch random augmentation occurs while sampling images for training the world model
Figure 2: Pipeline of depth image to surface normals image. Higher resolution depth images output as 64x64x3 resolution normals. Same algorithm is applied in both simulation and real-world environments in real-time. The Sobel filters use a kernel size of 9 and the input/output resolution used in this process is a square 256x256
Figure 3: Simultaneous captures of the Franka robot unfolding a towel with views of surface normals, RGB, and depth, respectively
Figure 4: Example unfolds of a t-shirt garment. 2 examples of failure cases, 2 near-success cases, and 2 success cases, respectively.
Figure 5: Smoothed chart of normalized rewards for our model, the vanilla Dreamerv2 model, an SAC, an SAC on vector inputs, and a pick/place script. All images used are Surface-Normals. Early-stopping is triggered at 80% unfold.

Learning to unfold cloth: Scaling up world models to deformable object manipulation

TL;DR

Abstract

Learning to unfold cloth: Scaling up world models to deformable object manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)