UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Ruihai Wu; Haoran Lu; Yiyan Wang; Yubo Wang; Hao Dong

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Ruihai Wu, Haoran Lu, Yiyan Wang, Yubo Wang, Hao Dong

TL;DR

This paper tackles category-level garment manipulation under diverse geometries and deformations by learning dense visual correspondence that is aware of topology and function. The approach first learns deformation- and object-agnostic point representations through self-supervised cross-deformation ($L_{CD}$) and cross-object ($L_{CO}$) contrastive learning, refined by a coarse-to-fine loss ($L_{C2F}$), and then adapts to specific tasks with few-shot functional fine-tuning. A skeleton-based topology (via Skeleton Merger) enables robust cross-object correspondence, while projection from flat to deformed states unifies the representations across garment states. Extensive simulation across three garment categories and three tasks, plus real-world evaluation with dual-arm manipulation, demonstrates superior generalization and policy generation from dense correspondences, enabling one-model, multi-task garment manipulation with minimal demonstrations.

Abstract

Garment manipulation (e.g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations. Although able to manipulate similar shaped garments in a certain task, previous works mostly have to design different policies for different tasks, could not generalize to garments with diverse geometries, and often rely heavily on human-annotated data. In this paper, we leverage the property that, garments in a certain category have similar structures, and then learn the topological dense (point-level) visual correspondence among garments in the category level with different deformations in the self-supervised manner. The topological correspondence can be easily adapted to the functional correspondence to guide the manipulation policies for various downstream tasks, within only one or few-shot demonstrations. Experiments over garments in 3 different categories on 3 representative tasks in diverse scenarios, using one or two arms, taking one or more steps, inputting flat or messy garments, demonstrate the effectiveness of our proposed method. Project page: https://warshallrho.github.io/unigarmentmanip.

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

TL;DR

) and cross-object (

) contrastive learning, refined by a coarse-to-fine loss (

), and then adapts to specific tasks with few-shot functional fine-tuning. A skeleton-based topology (via Skeleton Merger) enables robust cross-object correspondence, while projection from flat to deformed states unifies the representations across garment states. Extensive simulation across three garment categories and three tasks, plus real-world evaluation with dual-arm manipulation, demonstrates superior generalization and policy generation from dense correspondences, enabling one-model, multi-task garment manipulation with minimal demonstrations.

Abstract

Paper Structure (34 sections, 3 equations, 12 figures, 5 tables)

This paper contains 34 sections, 3 equations, 12 figures, 5 tables.

Introduction
Related Work
Dense Representations for Manipulation
Visual Correspondence Learning
Cloth and Garment Manipulation
Problem Formulation
Method
Overview
Self-supervised Topological Dense Visual Correspondence Learning
Cross-Deformation Correspondence
Cross-Object Correspondence
Integration of Cross-Deformation and Cross-Object Correspondence
Coarse-to-fine Correspondence Refinement
From Topological to Functional: Few-shot Adaptation for Downstream Tasks
Manipulation Policy Generation
...and 19 more sections

Figures (12)

Figure 1: Given a demonstration garment (Middle) and the demonstration actions to fulfill a task (Middle-Left/-Right), for a novel object, we find the manipulation points using the proposed Dense Visual Correspondence for Garment Manipulation and execute the corresponding action to fulfill the task (Left/Right). Color similarity denotes in the correspondence space.
Figure 2: Our Proposed Learning Framework for Dense Visual Correspondence.(Left) We extract the cross-deform correspondence and cross-object correspondence point pairs respectively using self-play and skeletons, and train the per-point correspondence scores in the contrastive manner, with the Coarse-to-fine module refines the quality. (Middle) Learned correspondence demonstrates point-level similarity across different garments in different deformations. (Right) The learned point-level correspondence can facilitates multiple diverse downstream tasks using one or few-shot demonstrations.
Figure 3: Correspondence Guided Manipulation on Different Garment Types and Tasks. From left to right: observation, correspondence, manipulation points (colored points) selected using correspondence to demonstrations and the manipulation action.
Figure 4: Learned Dense Visual Correspondence. For each category, we show correspondence for 5 objects in different deformations. Color similarity denotes correspondence similarity.
Figure 5: Visualization of Different Folding Policies.
...and 7 more figures

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

TL;DR

Abstract

UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Authors

TL;DR

Abstract

Table of Contents

Figures (12)