Table of Contents
Fetching ...

TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled Cloths

Wenbo Wang, Gen Li, Miguel Zamora, Stelian Coros

TL;DR

TRTM tackles the problem of reconstructing and manipulating crumpled cloths from single top-view depth by introducing a template-based cloth GNN that explicitly recovers the full cloth mesh and vertex visibilities. Through sim-real registration, synthetic cloth meshes are aligned with real-world configurations, enabling a target-oriented manipulation pipeline that uses a clustered mesh for robust dual-arm and single-arm actions. The approach achieves accurate reconstructions with average vertex errors around 1.22 cm in simulation and 1.73 cm in the real world, and demonstrates high manipulation success across flat, triangle, and rectangle targets, generalizing to multiple daily cloth topologies. A large synthetic dataset (>120k meshes) and a real-world 3k-configuration dataset accompany the released code and demos, supporting broader adoption and enabling explicit, controllable cloth state representations for robotic manipulation.

Abstract

Precise reconstruction and manipulation of the crumpled cloths is challenging due to the high dimensionality of cloth models, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human reconstruction to template-based reconstruct crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly describes the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our TRTM system can be applied to daily cloths that have similar topologies as our template mesh, but with different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: https://wenbwa.github.io/TRTM/ .

TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled Cloths

TL;DR

TRTM tackles the problem of reconstructing and manipulating crumpled cloths from single top-view depth by introducing a template-based cloth GNN that explicitly recovers the full cloth mesh and vertex visibilities. Through sim-real registration, synthetic cloth meshes are aligned with real-world configurations, enabling a target-oriented manipulation pipeline that uses a clustered mesh for robust dual-arm and single-arm actions. The approach achieves accurate reconstructions with average vertex errors around 1.22 cm in simulation and 1.73 cm in the real world, and demonstrates high manipulation success across flat, triangle, and rectangle targets, generalizing to multiple daily cloth topologies. A large synthetic dataset (>120k meshes) and a real-world 3k-configuration dataset accompany the released code and demos, supporting broader adoption and enabling explicit, controllable cloth state representations for robotic manipulation.

Abstract

Precise reconstruction and manipulation of the crumpled cloths is challenging due to the high dimensionality of cloth models, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human reconstruction to template-based reconstruct crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly describes the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our TRTM system can be applied to daily cloths that have similar topologies as our template mesh, but with different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: https://wenbwa.github.io/TRTM/ .
Paper Structure (13 sections, 13 equations, 6 figures, 2 tables)

This paper contains 13 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Different state representations of crumpled cloths. From left to right: top-view color images; top-view depth images or point clouds; sparse visible features or encoded latent vectors; our template-based reconstruction mesh and clustered mesh group for robot manipulation. From top to bottom: configurations of the randomly one-time dragged rectangle cloth, two-times folded template square cloth, and one-time dropped larger square cloth.
  • Figure 2: System Overview. a) Sim-real registration of one real-world cloth to a synthetic mass-spring cloth mesh with imitated top-view depth observations. b) Single-view template-based reconstruction of a crumpled cloth from its top-view depth observation only, using our template-based cloth GNN. c) Querying the best visible vertex pairs within the reconstructed and clustered mesh group, according to different target configurations: flat, triangle, and rectangle. d) Dual-arm manipulation using one ABB YuMi Robot at the selected cloth vertex pair with optimized grasp-hang-and-flip trajectories.
  • Figure 3: Template-based GNN. a) Synthetic training with simulated cloth meshes and depth images. From left to right: synthetic cloth dataset and template mesh, image feature encoding and template graph encoding, graph feature updating with attention message flow, mesh decoding and supervising. b) Real-world tuning with collected cloth configurations and depth images. From left to right: real-world cloth dataset, point cloud observation, pixel-wise tuned result from the GNN prediction. In our work, we observe small improvements during the tuning process, as discussed in the Ablation Study.
  • Figure 4: Qualitative Evaluation of our Target-oriented Manipulation. a) Dual-arm flipping by querying visible group vertex pairs according to different target configurations: flat, triangle, and rectangle. b) Single-arm flattening by sequentially dragging the visible group vertex to its canonical target position.
  • Figure 5: Quantitative Evaluation of our Target-oriented Manipulation. a) Dual-arm flipping experiments with flat (red), triangle (blue), and rectangle (green) targets. We demonstrate the flipping results with the simulated ground truth meshes (Simu-GT), with the real-world reconstruction meshes (Real-GT), and value networks (Real-VA, flatten only). b) Real-world single-arm dragging for flattening experiments: ours (orange) and VCD (gray).
  • ...and 1 more figures