RaggeDi: Diffusion-based State Estimation of Disordered Rags, Sheets, Towels and Blankets
Jikai Ye, Wanze Li, Shiraz Khan, Gregory S. Chirikjian
TL;DR
This work presents RaggeDi, a diffusion-based framework for estimating the full state of deformable cloth from a single RGB-D image. By representing the cloth state as a translation map $\boldsymbol{\tau} \in \mathbb{R}^{H\times W\times 3}$ that aligns a canonical flattened mesh to its deformed configuration, RaggeDi recasts cloth state estimation as conditional image generation solved with a DDPM conditioned on depth. The approach demonstrates strong accuracy and speed in simulation and shows zero-shot sim-to-real transfer in real-world tests, with optional refinement via point-cloud registration. This method enables robust, real-time manipulation of cloth in robotic tasks such as dressing, folding, and covering, and points to future work on more complex topologies and mesh-level diffusion.
Abstract
Cloth state estimation is an important problem in robotics. It is essential for the robot to know the accurate state to manipulate cloth and execute tasks such as robotic dressing, stitching, and covering/uncovering human beings. However, estimating cloth state accurately remains challenging due to its high flexibility and self-occlusion. This paper proposes a diffusion model-based pipeline that formulates the cloth state estimation as an image generation problem by representing the cloth state as an RGB image that describes the point-wise translation (translation map) between a pre-defined flattened mesh and the deformed mesh in a canonical space. Then we train a conditional diffusion-based image generation model to predict the translation map based on an observation. Experiments are conducted in both simulation and the real world to validate the performance of our method. Results indicate that our method outperforms two recent methods in both accuracy and speed.
