Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation

Lanpei Li; Enrico Donato; Vincenzo Lomonaco; Egidio Falotico

Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation

Lanpei Li, Enrico Donato, Vincenzo Lomonaco, Egidio Falotico

TL;DR

The paper addresses the challenge of creating a single versatile controller for in-hand manipulation with soft robots across diverse objects while mitigating catastrophic forgetting. It introduces Continual Policy Distillation (CPD), which distills policies from multiple object-specific experts into a unified student policy using exemplar-based rehearsal within a continual learning framework. The authors evaluate offline distillation loss functions and replay strategies, finding KL loss performs best, and demonstrate that CPD can achieve adaptive manipulation across shapes with reduced training time and preserved prior knowledge. Practically, CPD offers a privacy-conscious, memory-efficient path toward deploying soft robotic manipulation controllers capable of handling a wide range of objects in real-world settings.

Abstract

Dexterous manipulation, often facilitated by multi-fingered robotic hands, holds solid impact for real-world applications. Soft robotic hands, due to their compliant nature, offer flexibility and adaptability during object grasping and manipulation. Yet, benefits come with challenges, particularly in the control development for finger coordination. Reinforcement Learning (RL) can be employed to train object-specific in-hand manipulation policies, but limiting adaptability and generalizability. We introduce a Continual Policy Distillation (CPD) framework to acquire a versatile controller for in-hand manipulation, to rotate different objects in shape and size within a four-fingered soft gripper. The framework leverages Policy Distillation (PD) to transfer knowledge from expert policies to a continually evolving student policy network. Exemplar-based rehearsal methods are then integrated to mitigate catastrophic forgetting and enhance generalization. The performance of the CPD framework over various replay strategies demonstrates its effectiveness in consolidating knowledge from multiple experts and achieving versatile and adaptive behaviours for in-hand manipulation tasks.

Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 2 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Learning Control Policies for Soft Robots
Policy Distillation
Continual Learning in Robotics
Methodology
In-Hand Manipulation Task
Learning Control Policy via Reinforcement Learning
Offline Experts' Demonstrations and Policy Distillation
Knowledge Integration: Continual Policy Distillation
Experimental Results
Expert Control Policy Learning
Continual Policy Distillation
Discussion
Conclusion
...and 1 more sections

Figures (6)

Figure 1: Simulation of In-hand Manipulation
Figure 2: CPD framework pipeline
Figure 3: Objects used for In-Hand Manipulation
Figure 4: Learning curve for cube manipulation
Figure 5: Average performance of experts' policy
...and 1 more figures

Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation

TL;DR

Abstract

Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)