Learning Hybrid Policies for MPC with Application to Drone Flight in Unknown Dynamic Environments

Zhaohan Feng; Jie Chen; Wei Xiao; Jian Sun; Bin Xin; Gang Wang

Learning Hybrid Policies for MPC with Application to Drone Flight in Unknown Dynamic Environments

Zhaohan Feng, Jie Chen, Wei Xiao, Jian Sun, Bin Xin, Gang Wang

TL;DR

This work addresses autonomous drone traversal through swinging gates with unknown dynamics by proposing hyMPC, a hybrid control framework that blends parameterized MPC with learning-based high-level decisions. A high-level Gaussian policy determines a mix between two MPC subtasks—gate-following and gate-traversing—while an online model predicts gate motion to supply real-time references; policy search is performed episodically, and deep neural nets are trained offline to emit preferred traversal timing and mixing weights. The approach is validated in simulations, showing hyMPC achieves near-perfect success and tighter traversal errors compared to baselines across varying initial distances and under thrust perturbations, including multi-gate scenarios. The findings suggest hyMPC provides robust, data-efficient adaptation to unknown environmental dynamics with practical implications for real-world drone operations in dynamic environments.

Abstract

In recent years, drones have found increased applications in a wide array of real-world tasks. Model predictive control (MPC) has emerged as a practical method for drone flight control, owing to its robustness against modeling errors/uncertainties and external disturbances. However, MPC's sensitivity to manually tuned parameters can lead to rapid performance degradation when faced with unknown environmental dynamics. This paper addresses the challenge of controlling a drone as it traverses a swinging gate characterized by unknown dynamics. This paper introduces a parameterized MPC approach named hyMPC that leverages high-level decision variables to adapt to uncertain environmental conditions. To derive these decision variables, a novel policy search framework aimed at training a high-level Gaussian policy is presented. Subsequently, we harness the power of neural network policies, trained on data gathered through the repeated execution of the Gaussian policy, to provide real-time decision variables. The effectiveness of hyMPC is validated through numerical simulations, achieving a 100\% success rate in 20 drone flight tests traversing a swinging gate, demonstrating its capability to achieve safe and precise flight with limited prior knowledge of environmental dynamics.

Learning Hybrid Policies for MPC with Application to Drone Flight in Unknown Dynamic Environments

TL;DR

Abstract

Paper Structure (15 sections, 16 equations, 5 figures, 3 tables)

This paper contains 15 sections, 16 equations, 5 figures, 3 tables.

Introduction
Preliminaries and System Modeling
Drone dynamics
Model predictive control
Policy search
RL-driven MPC Trajectory Optimization
Episode-based Policy Search
Policy update
Deep policy learning
Simulation Results
Simulation settings
Comparison
Robustness test
Multi-gate traversing
Conclusions

Figures (5)

Figure 1: A comparison between the proposed hyMPC, high-MPC, and standard MPC. (a) hyMPC with unknown gate dynamics. (b) high-MPC with known gate dynamics. (c) The standard MPC with unknown gate dynamics.
Figure 2: Average traversal error and traversal time of the Gaussian policy, neural network policies, high-MPC highmpc2 and manual-MPC.
Figure 3: A flight in which the drone is initialized $1m$ away from the gate in the axial direction. At the start, the drone chose to turn left and backtrack to create sufficient axial space to facilitate the traversal task.
Figure 4: A visualization of the comparison between the proposed framework hyMPC and high-MPC on different initial axial distances between the drone and the gate. In each subfigure, the left shows the flight produced by hyMPC, while the right displays the flight produced by high-MPC. (a) Initial distance of $1m$. (b) Initial distance of $2m$. (c) Initial distance of $3m$. (d) Initial distance of $4m$. (e) Initial distance of $5m$. (f) Initial distance of $6m$.
Figure 5: Average traversal error and traversal time of the neural network policies in the multi-gate traversing task.

Learning Hybrid Policies for MPC with Application to Drone Flight in Unknown Dynamic Environments

TL;DR

Abstract

Learning Hybrid Policies for MPC with Application to Drone Flight in Unknown Dynamic Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (5)