GPT-Fabric: Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models

Vedant Raval; Enyu Zhao; Hejia Zhang; Stefanos Nikolaidis; Daniel Seita

GPT-Fabric: Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models

Vedant Raval, Enyu Zhao, Hejia Zhang, Stefanos Nikolaidis, Daniel Seita

TL;DR

The proposed GPT-Fabric is a promising approach for high-precision fabric manipulation tasks that matches the state-of-the-art in fabric smoothing, and also achieves comparable performance with most prior fabric folding methods tested.

Abstract

Fabric manipulation has applications in folding blankets, handling patient clothing, and protecting items with covers. It is challenging for robots to perform fabric manipulation since fabrics have infinite-dimensional configuration spaces, complex dynamics, and may be in folded or crumpled configurations with severe self-occlusions. Prior work on robotic fabric manipulation relies either on heavily engineered setups or learning-based approaches that create and train on robot-fabric interaction data. In this paper, we propose GPT-Fabric for the canonical tasks of fabric smoothing and folding, where GPT directly outputs an action informing a robot where to grasp and pull a fabric. We perform extensive experiments in simulation to test GPT-Fabric against prior methods for smoothing and folding. GPT-Fabric matches the state-of-the-art in fabric smoothing, and also achieves comparable performance with most prior fabric folding methods tested, even without explicitly training on a fabric-specific dataset (i.e., zero-shot manipulation). Furthermore, we apply GPT-Fabric in physical experiments over 10 smoothing and 12 folding rollouts. Our results suggest that GPT-Fabric is a promising approach for high-precision fabric manipulation tasks

GPT-Fabric: Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models

TL;DR

Abstract

Paper Structure (32 sections, 7 figures, 5 tables)

This paper contains 32 sections, 7 figures, 5 tables.

Introduction
Related Work
Fabric Manipulation
Foundation Models in Robotics
Problem Statement
Smoothing
Folding
Method
GPT-Fabric: Overall Structure
GPT-Fabric for Smoothing
GPT-Fabric for Folding
Simulation Experiments
Fabric Smoothing Experiments
Fabric Folding Experiments
Data Sizes
...and 17 more sections

Figures (7)

Figure 1: Top: high-level overview of GPT-Fabric. The input to the foundation model (GPT) is the current image observation of the fabric and the task information. The latter might include fabric manipulation strategies generated by a VLM, natural language descriptions to prompt the foundation models, and (for folding tasks) the subgoal sequence targets (see Figure \ref{['fig:subgoals-sim']}). GPT-Fabric directly produces actions (e.g., pick-and-place) for a robot. Bottom: example rollouts of GPT-Fabric for smoothing and folding.
Figure 2: Subgoal sequences for the four folding tasks we consider (from mo2022foldsformer) with maximum rollout lengths $T$. We use these for folding in simulation (Section \ref{['ssec:folding-exps']}) and real (Section \ref{['sec:experiments_physical']}).
Figure 3: Our GPT-Fabric method, for smoothing (left) and folding (right). The input prompt to GPT includes the current RGB-D image of the fabric $\mathbf{o}_t$ (only showing RGB above) and the task information. For folding, this information includes subgoal sequences (see Figure \ref{['fig:subgoals-sim']}). We also preprocess $\mathbf{o}_t$ for GPT-Fabric and use an evaluation module for smoothing; see Figure \ref{['fig:gpt-smoothing']} for details.
Figure 4: Details of image annotation and evaluation for smoothing (see Section \ref{['ssec:smoothing']} for more). From RGB-D image $\mathbf{o}_t$, we get a bounding box and approximate fabric center (a) via masking (b). If applicable, we annotate the prior placing point and its "symmetric" point about the fabric center (c). We detect corners (d) on the masked image, then combine (c) and (d) to get image (e) as input to GPT-4V. We use an evaluation module to verify GPT's output. If it fails, we ask GPT to try again with a correction message.
Figure 5: Qualitative results for smoothing (left) and folding (right) from GPT-Fabric in SoftGym simulation. We show three smoothing rollouts of varying frame lengths, where each frame shows one pick-and-place action. In all three smoothing rollouts, GPT-Fabric achieved NI$>$0.95. We show three examples of folding rollouts with different folding subgoals (see Figure \ref{['fig:subgoals-sim']}). GPT-Fabric was unable to achieve qualitatively good results for Corners Edges Inward.
...and 2 more figures

GPT-Fabric: Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models

TL;DR

Abstract

GPT-Fabric: Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)