Learning Keypoints for Robotic Cloth Manipulation using Synthetic Data
Thomas Lips, Victor-Louis De Gusseme, Francis wyffels
TL;DR
This work tackles the generalization gap in robotic cloth manipulation by introducing a synthetic data pipeline to train semantic keypoint detectors on almost-flattened clothes and validating them on a real-world aRTF dataset. It systematically explores procedural cloth mesh generation, random materials, and Nvidia Flex-based deformations to produce diverse training images, achieving a peak mAP of 64.3% with synthetic data and 18 px AKD, and 74.2% mAP and 8.7 px AKD after real-data fine-tuning. A comparative study shows single-layer meshes with random materials offer the best synthetic training results, while highlighting a persistent sim-to-real gap that requires higher fidelity assets (e.g., seams, UV maps) to overcome. The work provides a practical pipeline and dataset for advancing cloth folding research, while outlining pathways to improved realism and interactive perception for future improvements.
Abstract
Assistive robots should be able to wash, fold or iron clothes. However, due to the variety, deformability and self-occlusions of clothes, creating robot systems for cloth manipulation is challenging. Synthetic data is a promising direction to improve generalization, but the sim-to-real gap limits its effectiveness. To advance the use of synthetic data for cloth manipulation tasks such as robotic folding, we present a synthetic data pipeline to train keypoint detectors for almost-flattened cloth items. To evaluate its performance, we have also collected a real-world dataset. We train detectors for both T-shirts, towels and shorts and obtain an average precision of 64% and an average keypoint distance of 18 pixels. Fine-tuning on real-world data improves performance to 74% mAP and an average distance of only 9 pixels. Furthermore, we describe failure modes of the keypoint detectors and compare different approaches to obtain cloth meshes and materials. We also quantify the remaining sim-to-real gap and argue that further improvements to the fidelity of cloth assets will be required to further reduce this gap. The code, dataset and trained models are available
