Dreaming Falcon: Physics-Informed Model-Based Reinforcement Learning for Quadcopters
Eashan Vytla, Bhavanishankar Kalavakolanu, Andrew Perrault, Matthew McCrink
TL;DR
This work tackles robust quadcopter control under dynamic conditions by combining a Dreamer-style online model-based reinforcement learning framework with a physics-informed world model that predicts net forces and moments and integrates them using a $RK4$ scheme across a $6$-DOF system. It compares this physics-informed approach to a baseline RNN-based implicit dynamics model and finds that, while both can fit replay data, neither generalizes to unseen trajectories, especially during transitions between hover and forward flight, due to sparse coverage of operating-point changes. The results underscore the persistent challenge of learning accurate, generalizable world models for underactuated, high-dimensional aerial systems and point to data augmentation or richer training curricula as necessary steps toward robust, online-adaptive quadrotor controllers. The study highlights that effective real-time adaptation will require addressing how transition dynamics are represented and learned beyond hover/trim regimes.
Abstract
Current control algorithms for aerial robots struggle with robustness in dynamic environments and adverse conditions. Model-based reinforcement learning (RL) has shown strong potential in handling these challenges while remaining sample-efficient. Additionally, Dreamer has demonstrated that online model-based RL can be achieved using a recurrent world model trained on replay buffer data. However, applying Dreamer to aerial systems has been quite challenging due to its sample inefficiency and poor generalization of dynamics models. Our work explores a physics-informed approach to world model learning and improves policy performance. The world model treats the quadcopter as a free-body system and predicts the net forces and moments acting on it, which are then passed through a 6-DOF Runge-Kutta integrator (RK4) to predict future state rollouts. In this paper, we compare this physics-informed method to a standard RNN-based world model. Although both models perform well on the training data, we observed that they fail to generalize to new trajectories, leading to rapid divergence in state rollouts, preventing policy convergence.
