Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning

Tobias Nagel; Marco F. Huber

Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning

Tobias Nagel, Marco F. Huber

TL;DR

The paper tackles the challenge of identifying continuous-time, nonlinear ODEs from noisy data for data-efficient model-based reinforcement learning. It introduces the ODE-Learner, a hybrid framework that combines an Extended Kalman-Bucy Filter, Physics-Informed Neural Networks, and an Equation Learner with an ODE-Network of operator-neurons to learn $\dot{\mathbf{x}}=\mathbf{f}(\mathbf{x},\mathbf{u},\mathbf{w},t)$ and $\mathbf{y}=\mathbf{g}(\mathbf{x},\mathbf{u},\mathbf{v},t)$, while allowing incorporation of prior knowledge. The framework trains four networks under four loss terms to enforce data fidelity, EKBF consistency, and regularization, enabling robust identification from noisy data. Validation on a Duffing oscillator, cascaded tanks, and an inverted pendulum demonstrates superior data efficiency and competitive or superior accuracy relative to baselines, and the learned ODE can be used within an MPC-based RL loop to achieve swing-up tasks. While achieving data-efficient identification, the approach incurs substantial training time and requires careful hyperparameter tuning, pointing to future work on efficiency and parallelizing control learning.

Abstract

The identification of a mathematical dynamics model is a crucial step in the designing process of a controller. However, it is often very difficult to identify the system's governing equations, especially in complex environments that combine physical laws of different disciplines. In this paper, we present a new approach that allows identifying an ordinary differential equation by means of a physics-informed machine learning algorithm. Our method introduces a special neural network that allows exploiting prior human knowledge to a certain degree and extends it autonomously, so that the resulting differential equations describe the system as accurately as possible. We validate the method on a Duffing oscillator with simulation data and, additionally, on a cascaded tank example with real-world data. Subsequently, we use the developed algorithm in a model-based reinforcement learning framework by alternately identifying and controlling a system to a target state. We test the performance by swinging-up an inverted pendulum on a cart.

Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning

TL;DR

and

, while allowing incorporation of prior knowledge. The framework trains four networks under four loss terms to enforce data fidelity, EKBF consistency, and regularization, enabling robust identification from noisy data. Validation on a Duffing oscillator, cascaded tanks, and an inverted pendulum demonstrates superior data efficiency and competitive or superior accuracy relative to baselines, and the learned ODE can be used within an MPC-based RL loop to achieve swing-up tasks. While achieving data-efficient identification, the approach incurs substantial training time and requires careful hyperparameter tuning, pointing to future work on efficiency and parallelizing control learning.

Abstract

Paper Structure (16 sections, 23 equations, 6 figures, 1 table)

This paper contains 16 sections, 23 equations, 6 figures, 1 table.

Introduction
Related Work
Fundamentals
Extended Kalman-Bucy-Filter
Physics-Informed Neural Network
Equation Learner
Model-based Reinforcement Learning
Learning ODEs
ODE-Network
The ODE-Learner Framework
Validation
Duffing Oscillator
Cascaded Tanks
Inverted Pendulum on a Cart
Discussion
...and 1 more sections

Figures (6)

Figure 1: Sketches of the proposed architecture of a single operator-neuron (left) and of the whole ODE-Network (right).
Figure 2: Sketch of the ODE-Learner concept, describing the interactions of the first three loss functions. The different colors indicate the influence of a network to the respective loss.
Figure 3: Root mean squared error after identifying the Duffing oscillator with ten different parameter scenarios. Dots indicate outliers.
Figure 4: Sketch of the cascaded tank system.
Figure 5: Plots of the simulated height for the cascaded tank problem. (a) shows the result of our identified ODE, obtained by an ODE-Network and an MLP, compared to the original test data, while (b) shows state of the art system identification methods.
...and 1 more figures

Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning

TL;DR

Abstract

Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)