RaCIL: Ray Tracing based Multi-UAV Obstacle Avoidance through Composite Imitation Learning

Harsh Bansal; Vyom Goyal; Bhaskar Joshi; Akhil Gupta; Harikumar Kandath

RaCIL: Ray Tracing based Multi-UAV Obstacle Avoidance through Composite Imitation Learning

Harsh Bansal, Vyom Goyal, Bhaskar Joshi, Akhil Gupta, Harikumar Kandath

TL;DR

This work tackles UAV obstacle avoidance in dynamic, multi-UAV environments by proposing RaCIL, a composite imitation learning framework that fuses PPO with Behavioral Cloning and Generative Adversarial Imitation Learning, enhanced by ray-tracing based observations. The approach demonstrates that incorporating ray tracing improves obstacle detection and accelerates training, while GAIL promotes coordinated flight among multiple UAVs, yielding higher success rates than BC alone. Evaluation in Unity ML-Agents shows scalable performance from 1 to 3 UAVs, with notable gains in reward and safety metrics, suggesting practical potential for autonomous UAV operations in crowded or dynamic scenarios. The findings highlight the value of combining ray-tracing perception with imitation learning to achieve robust, scalable UAV navigation, with future work targeting 3D extension and real-world deployment.

Abstract

In this study, we address the challenge of obstacle avoidance for Unmanned Aerial Vehicles (UAVs) through an innovative composite imitation learning approach that combines Proximal Policy Optimization (PPO) with Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), enriched by the integration of ray-tracing techniques. Our research underscores the significant role of ray-tracing in enhancing obstacle detection and avoidance capabilities. Moreover, we demonstrate the effectiveness of incorporating GAIL in coordinating the flight paths of two UAVs, showcasing improved collision avoidance capabilities. Extending our methodology, we apply our combined PPO, BC, GAIL, and ray-tracing framework to scenarios involving four UAVs, illustrating its scalability and adaptability to more complex scenarios. The findings indicate that our approach not only improves the reliability of basic PPO based obstacle avoidance but also paves the way for advanced autonomous UAV operations in crowded or dynamic environments.

RaCIL: Ray Tracing based Multi-UAV Obstacle Avoidance through Composite Imitation Learning

TL;DR

Abstract

Paper Structure (20 sections, 21 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 21 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
BACKGROUND
PRELIMINARIES AND PROBLEM FORMULATION
UAV Model
Proximal Policy Optimization (PPO)
Imitation Learning
Behavioral Cloning
Generative Adversarial Imitation Learning (GAIL)
Problem Formulation
Ray Tracing based Composite Imitation Learning (RaCIL)
The Environment
Observation Space
Action Space
Reward Function
The Agent
...and 5 more sections

Figures (5)

Figure 1: UAV Obstacle Avoidance: A UAV (Blue) aims to reach its goal while navigating around obstacles (Black) and avoiding collisions with other UAVs (Purple, Yellow) in a shared environment.
Figure 2: System Architecture: At time $t$ the agent takes action $A\textsubscript{t}$ according to the policy, causing the environment to transition to the next state $S\textsubscript{t}$. Observations from this state ($O\textsubscript{t}$) are sent to the GAIL Discriminator. The reward collected from the environment ($R_{\text{ext}}$) and GAIL ($R_{\text{GAIL}}$) are used to compute the cumulative loss which is used for policy updation.
Figure 3: Ray Tracing Scenario: This figure depicts the collection of observation by a UAV using RayTracing
Figure 4: Training Results for Study 1: Performance comparison of UAV navigation with and without Ray Tracing. Fig. \ref{['fig:train-study1']}(a) illustrates the progression of mean rewards throughout the training process, highlighting the effectiveness of Ray Tracing in enhancing navigational decisions. Fig. \ref{['fig:train-study1']}(b) shows the episode length progression, indicating improved efficiency and learning speed when Ray Tracing is utilized.
Figure 5: Policy Training Results for Study 2: Comparative analysis of UAV policy training using Behavior Cloning (BC) and BC integrated with Generative Adversarial Imitation Learning (GAIL). Fig. \ref{['fig:train-study2']}(a) showcases the mean reward progression, comparing the efficacy of BC alone versus the enhanced approach incorporating BC + GAIL. Fig. \ref{['fig:train-study2']}(b) illustrates the episode length progression, further demonstrating the impact of GAIL in refining learning efficiency.

RaCIL: Ray Tracing based Multi-UAV Obstacle Avoidance through Composite Imitation Learning

TL;DR

Abstract

RaCIL: Ray Tracing based Multi-UAV Obstacle Avoidance through Composite Imitation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)