Table of Contents
Fetching ...

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

Romain Poletti, Lorenzo Schena, Lilla Koloszar, Joris Degroote, Miguel Alfonso Mendez

TL;DR

This paper addresses the challenge of controlling flapping-wing drones with time-varying, nonlinear, and underactuated dynamics under noisy sensing. It proposes Reinforcement Twinning (RT), a hybrid framework that jointly trains a model-free policy via actor-critic RL and a model-based policy via adjoint-based data assimilation, with a policy referee and a trust-based switching mechanism to balance learning efficiency and model reliability. A nonlinear time-varying flight model (real NLTV) is approximated by a nonlinear time-invariant surrogate (NLTI) through a polynomial closure, enabling online model identification and MB policy optimization. Across offline calibration, online twinning, and biased-model scenarios, RT consistently outperforms purely MB or MF approaches, achieving faster convergence and robust performance, and demonstrating potential for real-time, adaptive control of evolving FWMAVs. The approach paves the way for extended 3D control and tighter MB–MF integration in dynamically changing aerial systems.

Abstract

Controlling the flight of flapping-wing drones requires versatile controllers that handle their time-varying, nonlinear, and underactuated dynamics from incomplete and noisy sensor data. Model-based methods struggle with accurate modeling, while model-free approaches falter in efficiently navigating very high-dimensional and nonlinear control objective landscapes. This article presents a novel hybrid model-free/model-based approach to flight control based on the recently proposed reinforcement twinning algorithm. The model-based (MB) approach relies on an adjoint formulation using an adaptive digital twin, continuously identified from live trajectories, while the model-free (MF) approach relies on reinforcement learning. The two agents collaborate through transfer learning, imitation learning, and experience sharing using the real environment, the digital twin and a referee. The latter selects the best agent to interact with the real environment based on performance within the digital twin and a real-to-virtual environment consistency ratio. The algorithm is evaluated for controlling the longitudinal dynamics of a flapping-wing drone, with the environment simulated as a nonlinear, time-varying dynamical system under the influence of quasi-steady aerodynamic forces. The hybrid control learning approach is tested with three types of initialization of the adaptive model: (1) offline identification using previously available data, (2) random initialization with full online identification, and (3) offline pre-training with an estimation bias, followed by online adaptation. In all three scenarios, the proposed hybrid learning approach demonstrates superior performance compared to purely model-free and model-based methods.

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

TL;DR

This paper addresses the challenge of controlling flapping-wing drones with time-varying, nonlinear, and underactuated dynamics under noisy sensing. It proposes Reinforcement Twinning (RT), a hybrid framework that jointly trains a model-free policy via actor-critic RL and a model-based policy via adjoint-based data assimilation, with a policy referee and a trust-based switching mechanism to balance learning efficiency and model reliability. A nonlinear time-varying flight model (real NLTV) is approximated by a nonlinear time-invariant surrogate (NLTI) through a polynomial closure, enabling online model identification and MB policy optimization. Across offline calibration, online twinning, and biased-model scenarios, RT consistently outperforms purely MB or MF approaches, achieving faster convergence and robust performance, and demonstrating potential for real-time, adaptive control of evolving FWMAVs. The approach paves the way for extended 3D control and tighter MB–MF integration in dynamically changing aerial systems.

Abstract

Controlling the flight of flapping-wing drones requires versatile controllers that handle their time-varying, nonlinear, and underactuated dynamics from incomplete and noisy sensor data. Model-based methods struggle with accurate modeling, while model-free approaches falter in efficiently navigating very high-dimensional and nonlinear control objective landscapes. This article presents a novel hybrid model-free/model-based approach to flight control based on the recently proposed reinforcement twinning algorithm. The model-based (MB) approach relies on an adjoint formulation using an adaptive digital twin, continuously identified from live trajectories, while the model-free (MF) approach relies on reinforcement learning. The two agents collaborate through transfer learning, imitation learning, and experience sharing using the real environment, the digital twin and a referee. The latter selects the best agent to interact with the real environment based on performance within the digital twin and a real-to-virtual environment consistency ratio. The algorithm is evaluated for controlling the longitudinal dynamics of a flapping-wing drone, with the environment simulated as a nonlinear, time-varying dynamical system under the influence of quasi-steady aerodynamic forces. The hybrid control learning approach is tested with three types of initialization of the adaptive model: (1) offline identification using previously available data, (2) random initialization with full online identification, and (3) offline pre-training with an estimation bias, followed by online adaptation. In all three scenarios, the proposed hybrid learning approach demonstrates superior performance compared to purely model-free and model-based methods.

Paper Structure

This paper contains 17 sections, 23 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: (a) Schematic of the flapping-wing drone control problem and (b) focus on the wing kinematics defined by the flapping angle $\phi$ and the pitching angle $\alpha$
  • Figure 2: Block diagram of the Reinforcement Twinning algorithm hybridizing a model-free (1.a) and model-based (1.b) policy to train a control agent $\pi$ (1.) using a virtual environment (8.) assimilated from live data of the real environment (3.).
  • Figure 3: Block diagram of the policy referee showing the cooperation mechanisms between the model-based and model-free policy used to define the live policy
  • Figure 4: (a) Comparison of open-loop trajectories generated by the real environment and the virtual environment (Model 3 (M3) and Model 5 (M5)) using the testing dataset, and (b) detailed view of the $x$, $z$, and $\theta$ states of the trajectories over time.
  • Figure 5: Scatter plot of the trajectories computed with the real and the virtual environment for five closure laws assimilated from the real trajectories.
  • ...and 6 more figures