Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

Ali M. Ali; Aryaman Gupta; Hashim A. Hashim

Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

Ali M. Ali, Aryaman Gupta, Hashim A. Hashim

TL;DR

The paper tackles sim-to-real policy transfer for offshore VTOL-UAV docking by decomposing the task into a model-based approach phase and a learning-based landing phase trained offline under randomized wave disturbances modeled by the JONSWAP spectrum. It evaluates both value-based (DQN variants) and policy-based (PPO) DRL agents, finding that PPO yields more stable and efficient landings under uncertainty, thereby enhancing transfer to real-world offshore docks. The approach leverages domain randomization through episode-specific wave realizations to improve generalization and reduce sim-to-real gaps, with results showing PPO achieving the lowest final impact velocity and fastest landing times among the tested agents. The work demonstrates the practicality of using a two-phase framework and PPO for real-time control in challenging marine environments and suggests incorporating onboard visual feedback in future extensions to further close the sim-to-real loop.

Abstract

This paper proposes a novel Reinforcement Learning (RL) approach for sim-to-real policy transfer of Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL-UAV). The proposed approach is designed for VTOL-UAV landing on offshore docking stations in maritime operations. VTOL-UAVs in maritime operations encounter limitations in their operational range, primarily stemming from constraints imposed by their battery capacity. The concept of autonomous landing on a charging platform presents an intriguing prospect for mitigating these limitations by facilitating battery charging and data transfer. However, current Deep Reinforcement Learning (DRL) methods exhibit drawbacks, including lengthy training times, and modest success rates. In this paper, we tackle these concerns comprehensively by decomposing the landing procedure into a sequence of more manageable but analogous tasks in terms of an approach phase and a landing phase. The proposed architecture utilizes a model-based control scheme for the approach phase, where the VTOL-UAV is approaching the offshore docking station. In the Landing phase, DRL agents were trained offline to learn the optimal policy to dock on the offshore station. The Joint North Sea Wave Project (JONSWAP) spectrum model has been employed to create a wave model for each episode, enhancing policy generalization for sim2real transfer. A set of DRL algorithms have been tested through numerical simulations including value-based agents and policy-based agents such as Deep \textit{Q} Networks (DQN) and Proximal Policy Optimization (PPO) respectively. The numerical experiments show that the PPO agent can learn complicated and efficient policies to land in uncertain environments, which in turn enhances the likelihood of successful sim-to-real transfer.

Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

TL;DR

Abstract

Paper Structure (15 sections, 27 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 27 equations, 8 figures, 3 tables, 2 algorithms.

Introduction
Motivation
Preliminaries
Deep Reinforcement Learning
Problem Formulation
Environment
Reward
Value based Agents
DQN
Double DQN
Dueling DQN
Policy based Agents
Proximal Policy Optimization (PPO)
Numerical Results
Conclusions & Future Work

Figures (8)

Figure 1: VTOL-UAV landing task on an offshore landing pad, where $\{\mathcal{B}\}$ refers to the body-fixed frame and $\{\mathcal{I}\}$ is the Inertial-frame. The distances $\{y_{1},y_{2}\}$ are from the VTOL-UAV to the landmarks attached to the docking station to be used in the approach phase. The distances $\{p_{1},p_{2}\}$ are from the inertial frame to the corners of the landing pad. For more information about the approach phase using landmark please refer to hashim2023exponentially, hashim2023observer.
Figure 2: DQN Agent architecture interacting with the docking phase environment.
Figure 3: Dueling Architecture of Target & Prediction Neural Networks.
Figure 4: PPO Agent architecture interacting with the docking phase environment.
Figure 5: The $\epsilon$ greedy value used in DQN agents. The initial and final greedy values were $\epsilon_{0}=1$$\epsilon_{f}=0.05$ respectively.
...and 3 more figures

Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

TL;DR

Abstract

Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

Authors

TL;DR

Abstract

Table of Contents

Figures (8)