Learning from Demonstration Framework for Multi-Robot Systems Using Interaction Keypoints and Soft Actor-Critic Methods
Vishnunandan L. N. Venkatesh, Byung-Cheol Min
TL;DR
This work addresses learning from demonstration in multi-robot systems by leveraging visual demonstrations and Interaction Keypoints (IKs) alongside Soft Actor-Critic (SAC) policies. It proposes a vision-based LfD framework with four modules: Vision Tracking, Task Policy Inference, RL Skill Learning, and Robot Execution, with a classifier-based reward to guide RL for unseen contact skills; the RL reward is defined as $R = w_1 C(I_n) + w_2 IK_{reward} - w_3 IK_{Fail_{penalty}}$. The approach enables behavior-based and contact-based skill learning from a single demonstration for many tasks, demonstrated on Intruder Attack, Leader Follower, Object Transport, Object Rotate, and Object Color Sorting with real Hamster robots, achieving high success rates and showing robustness to object changes. The results suggest real-time, sim-to-real-friendly learning with reduced demonstration requirements and potential extension to heterogeneous robots and trajectory-based skills.
Abstract
Learning from Demonstration (LfD) is a promising approach to enable Multi-Robot Systems (MRS) to acquire complex skills and behaviors. However, the intricate interactions and coordination challenges in MRS pose significant hurdles for effective LfD. In this paper, we present a novel LfD framework specifically designed for MRS, which leverages visual demonstrations to capture and learn from robot-robot and robot-object interactions. Our framework introduces the concept of Interaction Keypoints (IKs) to transform the visual demonstrations into a representation that facilitates the inference of various skills necessary for the task. The robots then execute the task using sensorimotor actions and reinforcement learning (RL) policies when required. A key feature of our approach is the ability to handle unseen contact-based skills that emerge during the demonstration. In such cases, RL is employed to learn the skill using a classifier-based reward function, eliminating the need for manual reward engineering and ensuring adaptability to environmental changes. We evaluate our framework across a range of mobile robot tasks, covering both behavior-based and contact-based domains. The results demonstrate the effectiveness of our approach in enabling robots to learn complex multi-robot tasks and behaviors from visual demonstrations.
