Learning Visuomotor Policy for Multi-Robot Laser Tag Game

Kai Li; Shiyu Zhao

Learning Visuomotor Policy for Multi-Robot Laser Tag Game

Kai Li, Shiyu Zhao

Abstract

In this paper, we study multi robot laser tag, a simplified yet practical shooting-game-style task. Classic modular approaches on these tasks face challenges such as limited observability and reliance on depth mapping and inter robot communication. To overcome these issues, we present an end-to-end visuomotor policy that maps images directly to robot actions. We train a high performing teacher policy with multi agent reinforcement learning and distill its knowledge into a vision-based student policy. Technical designs, including a permutation-invariant feature extractor and depth heatmap input, improve performance over standard architectures. Our policy outperforms classic methods by 16.7% in hitting accuracy and 6% in collision avoidance, and is successfully deployed on real robots. Code will be released publicly.

Learning Visuomotor Policy for Multi-Robot Laser Tag Game

Abstract

Paper Structure (12 sections, 6 equations, 9 figures, 3 tables)

This paper contains 12 sections, 6 equations, 9 figures, 3 tables.

Introduction
Related Works
Proposed Method
Problem Formulation
Overview
Teacher Policy via Multi-Agent Reinforcement Learning
Student Policy via Vision-Based Imitation Learning
Experimental Evaluations
Implementation Details and Policy Training
Results of the Policy
Comparison with Other Classic Methods
Conclusions

Figures (9)

Figure 1: Illustration of the multi-robot laser tag game. The laser beams shown are virtual for visualization purposes.
Figure 2: This figure shows the pipeline of our method. The teacher policy takes privileged states and outputs velocity command, and is trained with MARL. The student policy takes a time series of images and outputs velocity command. The student policy imitates the action output of the teacher and is deployed onboard. Shape of each tensor in the policy is shown. $N$ denotes the number of historical images used for the recurrent module. DATv2 is the monocular depth estimation method Depth Anything v2depth_anything_v2.
Figure 3: Structure of the feature extractor of the teacher policy. Opponent, obstacle and teammate states are encoded with self-attention and summation pooling. Each entity token (White circle) in the figure represents one embedded instance of the obstacle or neighbor robot.
Figure 4: The original image is sent into YOLOv5 and DATv2 to generate detection results and depth images. The detection results contain two classes, namely the enemy and the ally. A heat map that indicates the detection location is generated for each class. Then the two classes of heat maps are concatenated with the depth image along the channel axis to form the input tensor.
Figure 5: The first row shows enemy and ally detection results, the second row shows depth maps, and the third row shows full-gradient attention maps srinivas2019full. The attention maps highlight task-relevant regions of the input image, typically focusing on enemy targets and obstacles, consistent with human intuition. The first three columns present simulation data, and the last column presents real-world data.
...and 4 more figures

Learning Visuomotor Policy for Multi-Robot Laser Tag Game

Abstract

Learning Visuomotor Policy for Multi-Robot Laser Tag Game

Authors

Abstract

Table of Contents

Figures (9)