Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

Dongmyoung Lee; Wei Chen; Xiaoshuai Chen; Rui Zong; Petar Kormushev

Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

Dongmyoung Lee, Wei Chen, Xiaoshuai Chen, Rui Zong, Petar Kormushev

TL;DR

Real-world results demonstrate reliable cloth unfolding, even for crumpled fabrics, using only a single robotic arm, and highlight Touch G.O.G. as a compact and cost-effective solution for deformable object manipulation.

Abstract

Robotic cloth manipulation remains challenging due to the high-dimensional state space of fabrics, their deformable nature, and frequent occlusions that limit vision-based sensing. Although dual-arm systems can mitigate some of these issues, they increase hardware and control complexity. This paper presents Touch G.O.G., a compact vision-based tactile gripper and perception/control framework for single-arm bimanual cloth manipulation. The proposed framework combines three key components: (1) a novel gripper design and control strategy for in-gripper cloth sliding with a single robot arm, (2) a Vision Foundation Model-backboned Vision Transformer pipeline for cloth part classification (PC-Net) and edge pose estimation (PE-Net) using real and synthetic tactile images, and (3) an encoder-decoder synthetic data generator (SD-Net) that reduces manual annotation by producing high-fidelity tactile images. Experiments show 96% accuracy in distinguishing edges, corners, interior regions, and grasp failures, together with sub-millimeter edge localization and 4.5° orientation error. Real-world results demonstrate reliable cloth unfolding, even for crumpled fabrics, using only a single robotic arm. These results highlight Touch G.O.G. as a compact and cost-effective solution for deformable object manipulation.

Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

TL;DR

Abstract

Paper Structure (24 sections, 11 equations, 13 figures, 3 tables)

This paper contains 24 sections, 11 equations, 13 figures, 3 tables.

Introduction
Related works
Cloth Unfolding
Data Augmentation
Vision-based Tactile Manipulation
Touch G.O.G. System
Decoupled Width Control Gripper (D-WCG)
Tactile Variable Friction Gripper (T-VFG)
T-VFG Closed-Loop Control
Comparative Workspace Analysis
Vision-based Tactile Perception and Control
Cloth Part Classification Network
Data Preparation and Augmentation
Temporal Information for Classification
Network Architecture
...and 9 more sections

Figures (13)

Figure 1: Touch G.O.G. framework enabling single-arm bimanual cloth manipulation. Unlike rigid grippers, our system utilizes a human-inspired sliding strategy: (1) The robot identifies structural regions (e.g., corners), (2) The gripper actively expands and modulates friction to slide along the edge, and (3) Real-time tactile feedback corrects pose errors during sliding until the opposing corner is reached.
Figure 2: Touch G.O.G. overview: (left) the CAD model of the Touch G.O.G., showing the base, the Decoupled Width Control Gripper (D-WCG), and the two Tactile Variable Friction Grippers (T-VFGs) and (right) the versatility of D-WCG and T-VFG.
Figure 3: Detail of the T-VFG in the Touch G.O.G. system, showing the DIGIT sensor for high-resolution tactile feedback and the DC motor that drives the rack-and-pinion mechanism.
Figure 4: Comparison of workspaces: G.O.G. (top) shows the left finger (blue) and right finger (green) workspaces, exhibiting lower dexterity. Touch G.O.G. (bottom) shows larger coverage due to extra degrees of freedom, with left (green) and right (blue) regions superimposed on the CAD model.
Figure 5: Overview of the cloth manipulation framework: The environment setup shows the Touch G.O.G. system integrated with visuotactile sensors. (1) Cloth Part Classification Network (PC-Net) is trained with a SAM backbone and a convolutional head. (2) The SAM-backboned Mask Decoder Network (SD-Net) generates realistic synthetic tactile images from edge annotations. (3) The resulting synthetic dataset is used to train the Edge Pose Estimation Network (PE-Net), which estimates edge pose for (4) precise cloth sliding.
...and 8 more figures

Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

TL;DR

Abstract

Learning Bimanual Cloth Manipulation with Vision-based Tactile Sensing via Single Robotic Arm

Authors

TL;DR

Abstract

Table of Contents

Figures (13)