Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning
Tobias Nagel, Marco F. Huber
TL;DR
The paper tackles the challenge of identifying continuous-time, nonlinear ODEs from noisy data for data-efficient model-based reinforcement learning. It introduces the ODE-Learner, a hybrid framework that combines an Extended Kalman-Bucy Filter, Physics-Informed Neural Networks, and an Equation Learner with an ODE-Network of operator-neurons to learn $\dot{\mathbf{x}}=\mathbf{f}(\mathbf{x},\mathbf{u},\mathbf{w},t)$ and $\mathbf{y}=\mathbf{g}(\mathbf{x},\mathbf{u},\mathbf{v},t)$, while allowing incorporation of prior knowledge. The framework trains four networks under four loss terms to enforce data fidelity, EKBF consistency, and regularization, enabling robust identification from noisy data. Validation on a Duffing oscillator, cascaded tanks, and an inverted pendulum demonstrates superior data efficiency and competitive or superior accuracy relative to baselines, and the learned ODE can be used within an MPC-based RL loop to achieve swing-up tasks. While achieving data-efficient identification, the approach incurs substantial training time and requires careful hyperparameter tuning, pointing to future work on efficiency and parallelizing control learning.
Abstract
The identification of a mathematical dynamics model is a crucial step in the designing process of a controller. However, it is often very difficult to identify the system's governing equations, especially in complex environments that combine physical laws of different disciplines. In this paper, we present a new approach that allows identifying an ordinary differential equation by means of a physics-informed machine learning algorithm. Our method introduces a special neural network that allows exploiting prior human knowledge to a certain degree and extends it autonomously, so that the resulting differential equations describe the system as accurately as possible. We validate the method on a Duffing oscillator with simulation data and, additionally, on a cascaded tank example with real-world data. Subsequently, we use the developed algorithm in a model-based reinforcement learning framework by alternately identifying and controlling a system to a target state. We test the performance by swinging-up an inverted pendulum on a cart.
