Learning to unfold cloth: Scaling up world models to deformable object manipulation
Jack Rome, Stephen James, Subramanian Ramamoorthy
TL;DR
The paper tackles in-air cloth unfolding by scaling a world-model-based reinforcement learning approach. It extends DreamerV2 with depth-derived surface normals, dual-camera observations, and demonstration-driven replay-buffer augmentation to handle the complex dynamics of deformable cloth. In simulation and zero-shot real-world tests across multiple garment types, the method achieves improved unfolding performance, with an average real-world success around 74%, while highlighting practical limitations such as depth-sensor noise and single-arm manipulation. The work demonstrates the potential of geometry-focused observation spaces and demonstration-informed training to enable real-time, robust manipulation of deformable objects in service robotics contexts.
Abstract
Learning to manipulate cloth is both a paradigmatic problem for robotic research and a problem of immediate relevance to a variety of applications ranging from assistive care to the service industry. The complex physics of the deformable object makes this problem of cloth manipulation nontrivial. In order to create a general manipulation strategy that addresses a variety of shapes, sizes, fold and wrinkle patterns, in addition to the usual problems of appearance variations, it becomes important to carefully consider model structure and their implications for generalisation performance. In this paper, we present an approach to in-air cloth manipulation that uses a variation of a recently proposed reinforcement learning architecture, DreamerV2. Our implementation modifies this architecture to utilise surface normals input, in addition to modiying the replay buffer and data augmentation procedures. Taken together these modifications represent an enhancement to the world model used by the robot, addressing the physical complexity of the object being manipulated by the robot. We present evaluations both in simulation and in a zero-shot deployment of the trained policies in a physical robot setup, performing in-air unfolding of a variety of different cloth types, demonstrating the generalisation benefits of our proposed architecture.
