Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

Yuanpei Chen; Chen Wang; Li Fei-Fei; C. Karen Liu

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

Yuanpei Chen, Chen Wang, Li Fei-Fei, C. Karen Liu

TL;DR

The paper tackles the difficulty of executing long-horizon manipulation with dexterous hands by learning and chaining multiple high-dimensional policies. It introduces Sequential Dexterity, a bi-directional framework that uses a Transition Feasibility Function to both fine-tune preceding policies (backward) and determine optimal policy switches (forward), enhancing robustness and success in complex multi-step tasks. Empirical results in simulation and real-world Lego-like block building and tool-positioning tasks show significant gains over uni-directional baselines, with strong zero-shot transfer capabilities. The approach reduces reward-design complexity through backward goal transmission and demonstrates broad potential for general skill chaining beyond dexterous manipulation.

Abstract

Many real-world manipulation tasks consist of a series of subtasks that are significantly different from one another. Such long-horizon, complex tasks highlight the potential of dexterous hands, which possess adaptability and versatility, capable of seamlessly transitioning between different modes of functionality without the need for re-grasping or external tools. However, the challenges arise due to the high-dimensional action space of dexterous hand and complex compositional dynamics of the long-horizon tasks. We present Sequential Dexterity, a general system based on reinforcement learning (RL) that chains multiple dexterous policies for achieving long-horizon task goals. The core of the system is a transition feasibility function that progressively finetunes the sub-policies for enhancing chaining success rate, while also enables autonomous policy-switching for recovery from failures and bypassing redundant stages. Despite being trained only in simulation with a few task objects, our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand. Code and videos are available at https://sequential-dexterity.github.io

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

TL;DR

Abstract

Paper Structure (70 sections, 8 equations, 16 figures, 12 tables, 1 algorithm)

This paper contains 70 sections, 8 equations, 16 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Dexterous manipulation.
Long-horizon robot manipulation.
Skill-chaining.
Problem Setups
Constructing a structure of blocks.
Tool positioning.
Sequential Dexterity
Learning dexterous sub-policies
Policy chaining with transition feasibility function
Learning transition feasibility function.
Backward policy fine-tuning.
Policy switching with transition feasibility function
Implementation details
...and 55 more sections

Figures (16)

Figure 1: We present Sequential Dexterity, a system that learns to chain multiple versatile dexterous manipulation motions for tackling long-horizon tasks (e.g., building a block structure from a pile of blocks), which is able to zero-shot transfer to the real world.
Figure 2: Overview of the environment setups. (a) Workspace of Building Blocks task in simulation and real-world. (b) The setup of the Tool Positioning task. Initially, the tool is placed on the table in a random pose, and the dexterous hand needs to grasp the tool and re-orient it to a ready-to-use pose. The comparison results illustrate how the way of grasping directly influences subsequent orientation.
Figure 2: Results for the tool positioning task
Figure 3: Overview of Sequential Dexterity. (a) A bi-directional optimization scheme consists of a forward initialization process and a backward fine-tuning mechanism based on the transition feasibility function. (b) The learned system is able to zero-shot transfer to the real world. The transition feasibility function serves as a policy-switching identifier to select the most appropriate policy to execute.
Figure 4: Examples of policy-switching with transition feasibility function. Each example contains an image from the wrist-mount camera (left) and its corresponding feasibility score $c_i$ outputted by the transition feasibility function (right). We highlight the target block in the image for better visualization. The policy-switching process visits each sub-policy in reverse order. The first sub-policy with a feasibility score $c_i>1.0$ is selected for execution.
...and 11 more figures

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

TL;DR

Abstract

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (16)