Table of Contents
Fetching ...

Bidirectional Progressive Neural Networks with Episodic Return Progress for Emergent Task Sequencing and Robotic Skill Transfer

Suzan Ece Ada, Hanne Say, Emre Ugur, Erhan Oztop

TL;DR

The paper tackles autonomous, continual multi-task reinforcement learning for robots with different morphologies by introducing ERP-BPNN. It combines a Bidirectional Progressive Neural Network with an intrinsic motivation signal called Episodic Return Progress to drive soft, online task switching and enable bidirectional skill transfer. Empirical results on reaching tasks show faster convergence and superior performance across multiple metrics compared with baselines, demonstrating effective interleaved learning and transfer without requiring a task to converge first. This approach advances lifelong robotics by enabling scalable, Interleaved multi-task learning and transfer in heterogeneous robotic systems with potential real-world impact.

Abstract

Human brain and behavior provide a rich venue that can inspire novel control and learning methods for robotics. In an attempt to exemplify such a development by inspiring how humans acquire knowledge and transfer skills among tasks, we introduce a novel multi-task reinforcement learning framework named Episodic Return Progress with Bidirectional Progressive Neural Networks (ERP-BPNN). The proposed ERP-BPNN model (1) learns in a human-like interleaved manner by (2) autonomous task switching based on a novel intrinsic motivation signal and, in contrast to existing methods, (3) allows bidirectional skill transfer among tasks. ERP-BPNN is a general architecture applicable to several multi-task learning settings; in this paper, we present the details of its neural architecture and show its ability to enable effective learning and skill transfer among morphologically different robots in a reaching task. The developed Bidirectional Progressive Neural Network (BPNN) architecture enables bidirectional skill transfer without requiring incremental training and seamlessly integrates with online task arbitration. The task arbitration mechanism developed is based on soft Episodic Return progress (ERP), a novel intrinsic motivation (IM) signal. To evaluate our method, we use quantifiable robotics metrics such as 'expected distance to goal' and 'path straightness' in addition to the usual reward-based measure of episodic return common in reinforcement learning. With simulation experiments, we show that ERP-BPNN achieves faster cumulative convergence and improves performance in all metrics considered among morphologically different robots compared to the baselines.

Bidirectional Progressive Neural Networks with Episodic Return Progress for Emergent Task Sequencing and Robotic Skill Transfer

TL;DR

The paper tackles autonomous, continual multi-task reinforcement learning for robots with different morphologies by introducing ERP-BPNN. It combines a Bidirectional Progressive Neural Network with an intrinsic motivation signal called Episodic Return Progress to drive soft, online task switching and enable bidirectional skill transfer. Empirical results on reaching tasks show faster convergence and superior performance across multiple metrics compared with baselines, demonstrating effective interleaved learning and transfer without requiring a task to converge first. This approach advances lifelong robotics by enabling scalable, Interleaved multi-task learning and transfer in heterogeneous robotic systems with potential real-world impact.

Abstract

Human brain and behavior provide a rich venue that can inspire novel control and learning methods for robotics. In an attempt to exemplify such a development by inspiring how humans acquire knowledge and transfer skills among tasks, we introduce a novel multi-task reinforcement learning framework named Episodic Return Progress with Bidirectional Progressive Neural Networks (ERP-BPNN). The proposed ERP-BPNN model (1) learns in a human-like interleaved manner by (2) autonomous task switching based on a novel intrinsic motivation signal and, in contrast to existing methods, (3) allows bidirectional skill transfer among tasks. ERP-BPNN is a general architecture applicable to several multi-task learning settings; in this paper, we present the details of its neural architecture and show its ability to enable effective learning and skill transfer among morphologically different robots in a reaching task. The developed Bidirectional Progressive Neural Network (BPNN) architecture enables bidirectional skill transfer without requiring incremental training and seamlessly integrates with online task arbitration. The task arbitration mechanism developed is based on soft Episodic Return progress (ERP), a novel intrinsic motivation (IM) signal. To evaluate our method, we use quantifiable robotics metrics such as 'expected distance to goal' and 'path straightness' in addition to the usual reward-based measure of episodic return common in reinforcement learning. With simulation experiments, we show that ERP-BPNN achieves faster cumulative convergence and improves performance in all metrics considered among morphologically different robots compared to the baselines.
Paper Structure (20 sections, 5 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 5 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: (a) 2-DoF, (b) 3-DoF, (c) 4-DoF Reacher Robot Arm Environments
  • Figure 2: A high-level (a,b) ERP-BPNN architecture demonstrates ERP selecting Task 1 (a) then Task 2 (b), showcasing bidirectional flow for skill transfer in a many-to-many fashion among three tasks. Graphical representation of ERP-BPNN framework with ERP task switching wherein (c) Task 1 is selected to learn. Weight updates are denoted by red (Task 1) arrows. Dashed gray arrows indicate no gradient flow during the learning update. In the current report, Task 1, 2, and 3 refer to RL tasks for 2-Dof, 3-DoF, and 4-DoF Reacher Robot arms as illustrated in Fig. \ref{['fig:robotfigs']} (a), (b), (c) respectively.
  • Figure 3: Performances of the proposed model, ERP-BPNN, and the two baselines of RANDOM-BPNN and RANDOM-MLP across five random seeds are shown in terms of (a) maximum episodic return, (b) minimum expected final end-effector distance to goal, and (c) minimum expected deviation from the shortest path to the goal.
  • Figure 4: The policies obtained by our model and the baselines during multi-task learning in terms of straightness (a) and endpoint accuracy (b) are demonstrated (at $1750^{th}$ policy update).
  • Figure 5: Selection frequency plot of task switching by average episodic return progress with an iteration window size of $\nu=35$