Table of Contents
Fetching ...

cc-DRL: a Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor

Tao Yang, Huai-Ning Wu, Jun-Wei Wang

TL;DR

This paper proposes a convex-combined-DRL (cc-DRL) flight control algorithm for position and attitude of a class of morphing quadrotors, where the shape change is realized by the length variation of four arm rods.

Abstract

In comparison to common quadrotors, the shape change of morphing quadrotors endows it with a more better flight performance but also results in more complex flight dynamics. Generally, it is extremely difficult or even impossible for morphing quadrotors to establish an accurate mathematical model describing their complex flight dynamics. To figure out the issue of flight control design for morphing quadrotors, this paper resorts to a combination of model-free control techniques (e.g., deep reinforcement learning, DRL) and convex combination (CC) technique, and proposes a convex-combined-DRL (cc-DRL) flight control algorithm for position and attitude of a class of morphing quadrotors, where the shape change is realized by the length variation of four arm rods. In the proposed cc-DRL flight control algorithm, proximal policy optimization algorithm that is a model-free DRL algorithm is utilized to off-line train the corresponding optimal flight control laws for some selected representative arm length modes and hereby a cc-DRL flight control scheme is constructed by the convex combination technique. Finally, simulation results are presented to show the effectiveness and merit of the proposed flight control algorithm.

cc-DRL: a Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor

TL;DR

This paper proposes a convex-combined-DRL (cc-DRL) flight control algorithm for position and attitude of a class of morphing quadrotors, where the shape change is realized by the length variation of four arm rods.

Abstract

In comparison to common quadrotors, the shape change of morphing quadrotors endows it with a more better flight performance but also results in more complex flight dynamics. Generally, it is extremely difficult or even impossible for morphing quadrotors to establish an accurate mathematical model describing their complex flight dynamics. To figure out the issue of flight control design for morphing quadrotors, this paper resorts to a combination of model-free control techniques (e.g., deep reinforcement learning, DRL) and convex combination (CC) technique, and proposes a convex-combined-DRL (cc-DRL) flight control algorithm for position and attitude of a class of morphing quadrotors, where the shape change is realized by the length variation of four arm rods. In the proposed cc-DRL flight control algorithm, proximal policy optimization algorithm that is a model-free DRL algorithm is utilized to off-line train the corresponding optimal flight control laws for some selected representative arm length modes and hereby a cc-DRL flight control scheme is constructed by the convex combination technique. Finally, simulation results are presented to show the effectiveness and merit of the proposed flight control algorithm.
Paper Structure (17 sections, 32 equations, 11 figures, 4 tables, 3 algorithms)

This paper contains 17 sections, 32 equations, 11 figures, 4 tables, 3 algorithms.

Figures (11)

  • Figure 1: The structure of the proposed cc-DRL flight control algorithm for an arm-rod-length-varying quadrotor. Algorithm 1 shows the elaborate DRL algorithm for off-line training the optimal flight control laws for some selected representative length modes of four arm rods. Algorithm 2 proposes a convex combination method for arbitrary length of four arm rods, which can be used online or substituted by an offline pretrained neural network. Algorithm 3 provides a cc-DRL flight control scheme that receives external length variation commands (query set) for four arm rods and online updates the combination weight values of the trained optimal flight control laws (support set) to achieve a near optimal flight performance.
  • Figure 2: Sketch map of a morphing quadrotor with four variable-length arm rods.
  • Figure 3: The structure of networks.
  • Figure 4: The PPO algorithm. There are three parts: Environment, Agent, and ReplayBuffer. Environment is quadrotor dynamics and is used for interaction to generate states; Agent includes an action network and an evaluation network and is used for state evaluation and policy learning; and ReplayBuffer is used to store interaction data.
  • Figure 5: Figure-8 flight trajectory.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2