DeFNet: Deconstructed Strategy for Multi-step Fabric Folding Tasks

Ningquan Gu; Ruhan He; Lianqing Yu

DeFNet: Deconstructed Strategy for Multi-step Fabric Folding Tasks

Ningquan Gu, Ruhan He, Lianqing Yu

TL;DR

DeFNet tackles long-horizon fabric folding by decomposing the task into three modules: Folding Planning Module (FPM) operating in latent space to infer the shortest folding paths, Folding Action Module (FAM) using FlowNet-based optical flow to determine grasp-and-place actions, and Iterative Interactive Module (IIM) that continuously re-plans after each action to mitigate execution drift. The FPM leverages a Variational Autoencoder (VAE) and Latent Space Roadmap (LSR) to map start-goal states to a sequence of intermediate states, while FAM computes actions via a flow-based policy with FlowNet and PickNet. The IIM closes the loop by re-inputting the current observation as a new start state and repeating planning and execution until the goal is reached. Across simulation and real-robot experiments, DeFNet outperforms three state-of-the-art baselines and ablations show significant gains from incorporating latent-space folding paths and iterative re-planning, demonstrating robust, scalable fabric folding in multi-step tasks.

Abstract

Fabric folding through robots is complex and challenging due to the deformability of fabric. Based on deconstruction strategy, we split the complex fabric folding task into three relatively simple sub-tasks, and propose a Deconstructed Fabric Folding Network (DeFNet), including corresponding three modules to solve them. (1) We use the Folding Planning Module (FPM), which is based on Latent Space Roadmap, to infer the most straight folding intermediate states from the start to the goal in latent space. (2) We utilize the flow-based approach, Folding Action Module (FAM), to calculate the action coordinates and execute them to reach the inferred intermediate state. (3) We introduce an Iterative Interactive Module (IIM) for fabric folding tasks, which can iteratively execute the FPM and FAM after every grasp-and-place action until the fabric reaches the goal. Experimentally, We demonstrated our method on multi-step fabric folding tasks against three baselines in simulation. We also apply the method to an existing robotic system and present its performance.

DeFNet: Deconstructed Strategy for Multi-step Fabric Folding Tasks

TL;DR

Abstract

Paper Structure (22 sections, 7 equations, 9 figures, 4 tables)

This paper contains 22 sections, 7 equations, 9 figures, 4 tables.

Introduction
Related work
Learning for fabric manipulation
Latent space roadmap
Optical flow for policy learning
Problem statement
Approach
Method overview
Folding planning module
Folding action module
Iterative interactive module
Experiment
Training dataset
Implementation details
Network implementation details
...and 7 more sections

Figures (9)

Figure 1: Our proposed approach for fabric folding deconstructs the task into three modules. Firstly, the Folding Planning Module (FPM) reasons about the intermediate folding states of the fabric, given the initial and the goal states. The FPM generates a sequence of images with red borders, representing the inferred intermediate states. For multi-folding tasks, there are typically several shortest folding paths. Secondly, the Folding Action Module (FAM) utilizes a flow-based approach. We randomly select one of the shortest paths and feed the corresponding images into the FAM to obtain the grasp-and-place actions between the intermediate folding states. Finally, the Iterative Interactive Module (IIM) carries out each grasp-and-place action (indicated by a green arrow on the images) and takes the current observation of the fabric as a new start state to input to the FPM again. The IIM iteratively executes the first and second modules after each grasp-and-place action until the goal state is reached (the image with a green border). By utilizing these three modules, our approach can efficiently and effectively solve complex fabric folding tasks.
Figure 2: The pipeline of our approach
Figure 3: The Folding Planning process consists of several steps. First, the Variational Autoencoder (VAE) transforms the input images (a) into a low-dimensional latent representation (b). In this low-dimensional space, the manipulation planning is executed in a built Latent Space RoadMap (c), which determines the optimal sequence of intermediate folding states to reach the goal state. Finally, the predicted results (d) are transformed back into the high-dimensional image space (e) using the VAE. By combining the high-dimensional predicted results with the initial and goal states, the complete shortest folding plan steps are obtained.
Figure 4: The Folding Action Module (FAM) utilizes a flow policy to compute the grasp-and-place points and consists of a FlowNet and a PickNet. The FlowNet is responsible for computing the optical flow of each particle of the fabric, which provides information on how the fabric moves and changes shape during the folding process. The PickNet then reasons about the optimal pick point based on the computed optical flow. Finally, the place point is determined by querying the flow arrow of the pick point, which provides information on the direction and distance of the flow.
Figure 5: Picture (a) shows an inferred intermediate folding sub-goal generated by the FPM, while picture (b) shows the actual state achieved after a grasp-and-place action. As shown, there is a slight visible difference between the two, indicating that there may be deviations between the inferred sub-goals and the actual achieved states.
...and 4 more figures

DeFNet: Deconstructed Strategy for Multi-step Fabric Folding Tasks

TL;DR

Abstract

DeFNet: Deconstructed Strategy for Multi-step Fabric Folding Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)