Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones
Romain Poletti, Lorenzo Schena, Lilla Koloszar, Joris Degroote, Miguel Alfonso Mendez
TL;DR
This paper addresses the challenge of controlling flapping-wing drones with time-varying, nonlinear, and underactuated dynamics under noisy sensing. It proposes Reinforcement Twinning (RT), a hybrid framework that jointly trains a model-free policy via actor-critic RL and a model-based policy via adjoint-based data assimilation, with a policy referee and a trust-based switching mechanism to balance learning efficiency and model reliability. A nonlinear time-varying flight model (real NLTV) is approximated by a nonlinear time-invariant surrogate (NLTI) through a polynomial closure, enabling online model identification and MB policy optimization. Across offline calibration, online twinning, and biased-model scenarios, RT consistently outperforms purely MB or MF approaches, achieving faster convergence and robust performance, and demonstrating potential for real-time, adaptive control of evolving FWMAVs. The approach paves the way for extended 3D control and tighter MB–MF integration in dynamically changing aerial systems.
Abstract
Controlling the flight of flapping-wing drones requires versatile controllers that handle their time-varying, nonlinear, and underactuated dynamics from incomplete and noisy sensor data. Model-based methods struggle with accurate modeling, while model-free approaches falter in efficiently navigating very high-dimensional and nonlinear control objective landscapes. This article presents a novel hybrid model-free/model-based approach to flight control based on the recently proposed reinforcement twinning algorithm. The model-based (MB) approach relies on an adjoint formulation using an adaptive digital twin, continuously identified from live trajectories, while the model-free (MF) approach relies on reinforcement learning. The two agents collaborate through transfer learning, imitation learning, and experience sharing using the real environment, the digital twin and a referee. The latter selects the best agent to interact with the real environment based on performance within the digital twin and a real-to-virtual environment consistency ratio. The algorithm is evaluated for controlling the longitudinal dynamics of a flapping-wing drone, with the environment simulated as a nonlinear, time-varying dynamical system under the influence of quasi-steady aerodynamic forces. The hybrid control learning approach is tested with three types of initialization of the adaptive model: (1) offline identification using previously available data, (2) random initialization with full online identification, and (3) offline pre-training with an estimation bias, followed by online adaptation. In all three scenarios, the proposed hybrid learning approach demonstrates superior performance compared to purely model-free and model-based methods.
