Table of Contents
Fetching ...

Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance

Zheng Fang, Tianhao Chen, Dong Jiang, Zheng Zhang, Guangliang Li

TL;DR

This work tackles multi-AUV formation control and obstacle avoidance by addressing the reliance on optimal expert demonstrations in imitation learning. It introduces MAGAISIL, which iteratively substitutes sub-optimal expert trajectories with human-validated, self-generated trajectories to improve policy learning, built upon MAGAIL and implemented with IPPO and a discriminator to shape rewards. In Gazebo-based three-task experiments, MAGAISIL achieves performance close to or better than MAGAIL with optimal demonstrations and demonstrates robust adaptability to complex configurations. The approach reduces the need for optimal demonstrations while maintaining high performance, offering practical benefits for real-world multi-AUV coordination.

Abstract

Multiple autonomous underwater vehicles (multi-AUV) can cooperatively accomplish tasks that a single AUV cannot complete. Recently, multi-agent reinforcement learning has been introduced to control of multi-AUV. However, designing efficient reward functions for various tasks of multi-AUV control is difficult or even impractical. Multi-agent generative adversarial imitation learning (MAGAIL) allows multi-AUV to learn from expert demonstration instead of pre-defined reward functions, but suffers from the deficiency of requiring optimal demonstrations and not surpassing provided expert demonstrations. This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL), which can facilitate AUVs to learn policies by gradually replacing the provided sub-optimal demonstrations with self-generated good trajectories selected by a human trainer. Our experimental results in a multi-AUV formation control and obstacle avoidance task on the Gazebo platform with AUV simulator of our lab show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations and reach a performance close to or even better than MAGAIL with optimal demonstrations. Further results indicate that AUVs' policies trained via MAGAISIL can adapt to complex and different tasks as well as MAGAIL learning from optimal demonstrations.

Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance

TL;DR

This work tackles multi-AUV formation control and obstacle avoidance by addressing the reliance on optimal expert demonstrations in imitation learning. It introduces MAGAISIL, which iteratively substitutes sub-optimal expert trajectories with human-validated, self-generated trajectories to improve policy learning, built upon MAGAIL and implemented with IPPO and a discriminator to shape rewards. In Gazebo-based three-task experiments, MAGAISIL achieves performance close to or better than MAGAIL with optimal demonstrations and demonstrates robust adaptability to complex configurations. The approach reduces the need for optimal demonstrations while maintaining high performance, offering practical benefits for real-world multi-AUV coordination.

Abstract

Multiple autonomous underwater vehicles (multi-AUV) can cooperatively accomplish tasks that a single AUV cannot complete. Recently, multi-agent reinforcement learning has been introduced to control of multi-AUV. However, designing efficient reward functions for various tasks of multi-AUV control is difficult or even impractical. Multi-agent generative adversarial imitation learning (MAGAIL) allows multi-AUV to learn from expert demonstration instead of pre-defined reward functions, but suffers from the deficiency of requiring optimal demonstrations and not surpassing provided expert demonstrations. This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL), which can facilitate AUVs to learn policies by gradually replacing the provided sub-optimal demonstrations with self-generated good trajectories selected by a human trainer. Our experimental results in a multi-AUV formation control and obstacle avoidance task on the Gazebo platform with AUV simulator of our lab show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations and reach a performance close to or even better than MAGAIL with optimal demonstrations. Further results indicate that AUVs' policies trained via MAGAISIL can adapt to complex and different tasks as well as MAGAIL learning from optimal demonstrations.
Paper Structure (14 sections, 6 equations, 10 figures, 1 table)

This paper contains 14 sections, 6 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Illustration of the mechanism for our multi-agent generative adversarial interactive self-imitation learning (MAGAISIL) method.
  • Figure 2: Screenshot of the Gazebo simulation platform with a leader AUV and a follower AUV in the simulated underwater environment.
  • Figure 3: Illustration of the configuration of Task I, II and III in the underwater environment for formation control and obstacle avoidance.
  • Figure 4: The state representation of the leader and follower AUV in the tasks.
  • Figure 5: Cumulative rewards received by the leader AUV and follower AUV trained via MAGAISIL with suboptimal demonstrations (MAGAILSIL Optimal), MAGAIL with sub-optimal (MAGAIL Suboptimal) and optimal (MAGAIL Optimal) demonstrations in Task I. The shaded area is the 0.95 confidence interval and the bold line is the mean performance over three experimental trials. Two red lines show the performance of expert optimal and sub-optimal demonstrations.
  • ...and 5 more figures