Table of Contents
Fetching ...

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Ruihai Wu, Ziyu Zhu, Yuran Wang, Yue Chen, Jiarui Wang, Hao Dong

TL;DR

Cluttered garment manipulation poses challenges due to deformable, entangled garments. The authors propose point-level affordance learned from 3D point clouds to encode per-point actionability, garment geometry, structure, and inter-group relations, guiding retrieval. When entanglement prevents direct retrieval, an affordance-guided adaptation module iteratively reorganizes the pile via pick-and-place to reach manipulation-friendly states. Evaluations in GarmentLab across multiple scenes and real-world robot experiments show superior performance over baselines, demonstrating robust retrieval and adaptation for deformable garment clutter with practical implications for automation in laundry and wardrobe tasks.

Abstract

Cluttered garments manipulation poses significant challenges due to the complex, deformable nature of garments and intricate garment relations. Unlike single-garment manipulation, cluttered scenarios require managing complex garment entanglements and interactions, while maintaining garment cleanliness and manipulation stability. To address these demands, we propose to learn point-level affordance, the dense representation modeling the complex space and multi-modal manipulation candidates, while being aware of garment geometry, structure, and inter-object relations. Additionally, as it is difficult to directly retrieve a garment in some extremely entangled clutters, we introduce an adaptation module, guided by learned affordance, to reorganize highly-entangled garments into states plausible for manipulation. Our framework demonstrates effectiveness over environments featuring diverse garment types and pile configurations in both simulation and the real world. Project page: https://garmentpile.github.io/.

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

TL;DR

Cluttered garment manipulation poses challenges due to deformable, entangled garments. The authors propose point-level affordance learned from 3D point clouds to encode per-point actionability, garment geometry, structure, and inter-group relations, guiding retrieval. When entanglement prevents direct retrieval, an affordance-guided adaptation module iteratively reorganizes the pile via pick-and-place to reach manipulation-friendly states. Evaluations in GarmentLab across multiple scenes and real-world robot experiments show superior performance over baselines, demonstrating robust retrieval and adaptation for deformable garment clutter with practical implications for automation in laundry and wardrobe tasks.

Abstract

Cluttered garments manipulation poses significant challenges due to the complex, deformable nature of garments and intricate garment relations. Unlike single-garment manipulation, cluttered scenarios require managing complex garment entanglements and interactions, while maintaining garment cleanliness and manipulation stability. To address these demands, we propose to learn point-level affordance, the dense representation modeling the complex space and multi-modal manipulation candidates, while being aware of garment geometry, structure, and inter-object relations. Additionally, as it is difficult to directly retrieve a garment in some extremely entangled clutters, we introduce an adaptation module, guided by learned affordance, to reorganize highly-entangled garments into states plausible for manipulation. Our framework demonstrates effectiveness over environments featuring diverse garment types and pile configurations in both simulation and the real world. Project page: https://garmentpile.github.io/.

Paper Structure

This paper contains 28 sections, 3 equations, 24 figures, 5 tables.

Figures (24)

  • Figure 1: Point-Level Affordance for Cluttered Garments. A higher score denotes the higher actionability for downstream retrieval. Row 1: per-point affordance simultaneously reveals 2 garments suitable for retrieval. Row 2: it is aware of garment structures (grasping edges leads other parts contacting floor) and relations (retrieving one garment while dragging nearby entangled garments out), and thus avoids manipulating on points leading to such failures. Row 3 and 4: highly tangled garments may not have plausible manipulation points, affordance can guide reorganizing the scene, and thus garments plausible for manipulation will exist.
  • Figure 2: Framework Overview. Given the observed point cloud, the Affordance Module predicts the initial point-level manipulation (retrieval) affordance score. When actionability is not good enough, the framework proposes the adaptation pick-place action. It first predicts per-point pick affordance, and selects the pick point with the highest score, conditioned on which it predicts place affordance and selects the place point. After executing adaptation action, it receives a new point cloud and generates new affordance. When actionability is good enough, the robot retrieves on the point with the highest affordance score. This loop is executed until all garments are retrieved.
  • Figure 3: Learning Framework of Retrieval, Pick and Place Affordance. Upper-left: the Affordance Module predicts the point-level (retrieval) affordance score for the downstream task. Upper-right: PointNet++ backbone aggregates both local and global features that facilitate incorporating garment geometry, structure and relation information for each point. Lower-right: the Place Module, which predicts the point-level place score conditioned on a pick point for adaptation, is supervised by the trained Affordance Module. Lower-left: the Pick Module, which predicts the point-level place score for adaptation, is supervised by the Place Module.
  • Figure 4: Example Manipulation Sequences in WashingMachine and Sofa.
  • Figure 5: Retrieval Affordance before and after Adaptation, Adaptation Action indicated by Pick Affordance and Place Affordance. When predicted retrieval affordance is not good enough(columns 1, 2), the adaptation procedure will be triggered. The point with highest score in pick affordance will be chosen as $p_{pick}$ (column 3) while the point with the highest score in place affordance will be chosen as $p_{place}$ (column 4). After executing pick and place for adaptation, retrieval affordance has improved significantly (columns 5, 6).
  • ...and 19 more figures