Table of Contents
Fetching ...

Learning from Demonstration Framework for Multi-Robot Systems Using Interaction Keypoints and Soft Actor-Critic Methods

Vishnunandan L. N. Venkatesh, Byung-Cheol Min

TL;DR

This work addresses learning from demonstration in multi-robot systems by leveraging visual demonstrations and Interaction Keypoints (IKs) alongside Soft Actor-Critic (SAC) policies. It proposes a vision-based LfD framework with four modules: Vision Tracking, Task Policy Inference, RL Skill Learning, and Robot Execution, with a classifier-based reward to guide RL for unseen contact skills; the RL reward is defined as $R = w_1 C(I_n) + w_2 IK_{reward} - w_3 IK_{Fail_{penalty}}$. The approach enables behavior-based and contact-based skill learning from a single demonstration for many tasks, demonstrated on Intruder Attack, Leader Follower, Object Transport, Object Rotate, and Object Color Sorting with real Hamster robots, achieving high success rates and showing robustness to object changes. The results suggest real-time, sim-to-real-friendly learning with reduced demonstration requirements and potential extension to heterogeneous robots and trajectory-based skills.

Abstract

Learning from Demonstration (LfD) is a promising approach to enable Multi-Robot Systems (MRS) to acquire complex skills and behaviors. However, the intricate interactions and coordination challenges in MRS pose significant hurdles for effective LfD. In this paper, we present a novel LfD framework specifically designed for MRS, which leverages visual demonstrations to capture and learn from robot-robot and robot-object interactions. Our framework introduces the concept of Interaction Keypoints (IKs) to transform the visual demonstrations into a representation that facilitates the inference of various skills necessary for the task. The robots then execute the task using sensorimotor actions and reinforcement learning (RL) policies when required. A key feature of our approach is the ability to handle unseen contact-based skills that emerge during the demonstration. In such cases, RL is employed to learn the skill using a classifier-based reward function, eliminating the need for manual reward engineering and ensuring adaptability to environmental changes. We evaluate our framework across a range of mobile robot tasks, covering both behavior-based and contact-based domains. The results demonstrate the effectiveness of our approach in enabling robots to learn complex multi-robot tasks and behaviors from visual demonstrations.

Learning from Demonstration Framework for Multi-Robot Systems Using Interaction Keypoints and Soft Actor-Critic Methods

TL;DR

This work addresses learning from demonstration in multi-robot systems by leveraging visual demonstrations and Interaction Keypoints (IKs) alongside Soft Actor-Critic (SAC) policies. It proposes a vision-based LfD framework with four modules: Vision Tracking, Task Policy Inference, RL Skill Learning, and Robot Execution, with a classifier-based reward to guide RL for unseen contact skills; the RL reward is defined as . The approach enables behavior-based and contact-based skill learning from a single demonstration for many tasks, demonstrated on Intruder Attack, Leader Follower, Object Transport, Object Rotate, and Object Color Sorting with real Hamster robots, achieving high success rates and showing robustness to object changes. The results suggest real-time, sim-to-real-friendly learning with reduced demonstration requirements and potential extension to heterogeneous robots and trajectory-based skills.

Abstract

Learning from Demonstration (LfD) is a promising approach to enable Multi-Robot Systems (MRS) to acquire complex skills and behaviors. However, the intricate interactions and coordination challenges in MRS pose significant hurdles for effective LfD. In this paper, we present a novel LfD framework specifically designed for MRS, which leverages visual demonstrations to capture and learn from robot-robot and robot-object interactions. Our framework introduces the concept of Interaction Keypoints (IKs) to transform the visual demonstrations into a representation that facilitates the inference of various skills necessary for the task. The robots then execute the task using sensorimotor actions and reinforcement learning (RL) policies when required. A key feature of our approach is the ability to handle unseen contact-based skills that emerge during the demonstration. In such cases, RL is employed to learn the skill using a classifier-based reward function, eliminating the need for manual reward engineering and ensuring adaptability to environmental changes. We evaluate our framework across a range of mobile robot tasks, covering both behavior-based and contact-based domains. The results demonstrate the effectiveness of our approach in enabling robots to learn complex multi-robot tasks and behaviors from visual demonstrations.
Paper Structure (16 sections, 8 equations, 5 figures, 2 tables)

This paper contains 16 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Concept for learning from demonstration for multi-robot systems (MRS). The human expert demonstrator shows multiple tasks to the MRS, which are then learned and executed.
  • Figure 2: The proposed learning from demonstration for multi-robot systems framework follows a streamlined process: Human experts visually demonstrate tasks, captured by a 2D camera. These demonstrations undergo feature extraction in the Vision Tracking Module. The Task Policy Inference Module segments the demonstrations and identifies Interaction Keypoints, forming a Task Policy. When new contact skills arise, the RL skill Learning/Practice Module, using SAC networks, learns them with guidance from a binary decision classifier's reward signals. Finally, the Robot Execution Module allocates and executes tasks across multiple robots, showcasing the adaptability of the framework in various environmental conditions.
  • Figure 3: Apriori skills are modeled as skills that each individual robot can perform. These are individual robot skills and do not constitute multi robot skills.
  • Figure 4: Experimental Testbed shows the Hamster robots in the environment with a mounted overhead camera attached to the system.
  • Figure 5: Examples of demonstrations and task execution are presented. The objects used during demonstrations are different from the objects used during execution to showcase how our learning process is generalizable across different objects in the environment. For example, in (a), the yellow object was used for demonstration, while the blue object was used for execution. Experiment videos showing more examples can be found in the supplementary video.