DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

Xiaofeng Mao; Gabriele Giudici; Claudio Coppola; Kaspar Althoefer; Ildar Farkhatdinov; Zhibin Li; Lorenzo Jamone

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

Xiaofeng Mao, Gabriele Giudici, Claudio Coppola, Kaspar Althoefer, Ildar Farkhatdinov, Zhibin Li, Lorenzo Jamone

TL;DR

DexSkills tackles the challenge of learning long-horizon dexterous manipulation by decomposing tasks into a reusable library of primitive skills learned from human demonstrations using only proprioceptive and tactile data. It combines learning features, primitive skill policies, and supervised representation learning with an auto-regressive autoencoder and a label decoder to segment unseen demonstrations into skill sequences and execute them autonomously. The method defines 20 primitive skills and demonstrates autonomous execution of long-horizon tasks with high segmentation accuracy, supported by ablations and a public teleoperation dataset. The work advances data-efficient, hardware-centric imitation learning for real-world dexterous manipulation and provides a practical benchmark for tactile-haptic segmentation and one-shot task composition.

Abstract

Effective execution of long-horizon tasks with dexterous robotic hands remains a significant challenge in real-world problems. While learning from human demonstrations have shown encouraging results, they require extensive data collection for training. Hence, decomposing long-horizon tasks into reusable primitive skills is a more efficient approach. To achieve so, we developed DexSkills, a novel supervised learning framework that addresses long-horizon dexterous manipulation tasks using primitive skills. DexSkills is trained to recognize and replicate a select set of skills using human demonstration data, which can then segment a demonstrated long-horizon dexterous manipulation task into a sequence of primitive skills to achieve one-shot execution by the robot directly. Significantly, DexSkills operates solely on proprioceptive and tactile data, i.e., haptic data. Our real-world robotic experiments show that DexSkills can accurately segment skills, thereby enabling autonomous robot execution of a diverse range of tasks.

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 6 figures, 3 tables)

This paper contains 20 sections, 3 equations, 6 figures, 3 tables.

INTRODUCTION
Related Works
METHODS
Learning Features
Primitive Skills Learning
Supervised Representation Learning
Unknown long-horizon manipulation skills learning
Experimental Setting
Teleoperation Setup
The Primitive Skills
Long-Horizon Tasks
Data Collection
Experimental Results
Classifier Evaluation
Framework Validation
...and 5 more sections

Figures (6)

Figure 1: Overview of the proposed long-horizon task segmentation approach. Individual skills are segmented and classified at each temporal window of the demonstration. The demonstrations are collected via the teleoperation system (left) developed in giudici2023feeling.
Figure 2: The leader agent generates motor control commands for the end effector pose and finger joints of the hand. The follower robot executes corresponding actions based on these commands. During teleoperation, the follower robot provides haptic feedback. When operating the robot autonomously, we control the robot using a distinct MLP trained on the proprioceptive and tactile data (i.e. haptic data) of each separate skill.
Figure 3: The architecture of our Neural Network for supervised representation learning incorporates an auto-regressive autoencoder and a label decoder. This network processes time-series feature data as input, with the encoder transforming these features into a latent space. The temporal decoder reconstructs the features along with their predictions, whereas the label decoder extracts labels from the latent vectors. The label decoder is jointly trained with the autoencoder generating latent features that improve the segmentation performance.
Figure 4: Confusion matrix ($\%$) of the segmentation system on the Long-horizon demonstrations (detailed in Section \ref{['sec:LH TASK']}).
Figure 5: T-SNE visualization of the classifier latent features. Each point in the graph corresponds to a primitive skill instance, differentiated by various colours to distinguish among the primitive skills.
...and 1 more figures

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

TL;DR

Abstract

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)