Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

Shreya Das; Kundan Kumar; Muhammad Iqbal; Outi Savolainen; Dominik Baumann; Laura Ruotsalainen; Simo Särkkä

Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

Shreya Das, Kundan Kumar, Muhammad Iqbal, Outi Savolainen, Dominik Baumann, Laura Ruotsalainen, Simo Särkkä

TL;DR

LNNs are employed, which enforce an underlying Lagrangian structure to train the model within a Dyna-based MBRL framework, and the state-estimation-based method converges faster than the stochastic gradient-based method during neural network training.

Abstract

Model-based reinforcement learning (MBRL) is sample-efficient but depends on the accuracy of the learned dynamics, which are often modeled using black-box methods that do not adhere to physical laws. Those methods tend to produce inaccurate predictions when presented with data that differ from the original training set. In this work, we employ Lagrangian neural networks (LNNs), which enforce an underlying Lagrangian structure to train the model within a Dyna-based MBRL framework. Furthermore, we train the LNN using stochastic gradient-based and state-estimation-based optimizers to learn the network's weights. The state-estimation-based method converges faster than the stochastic gradient-based method during neural network training. Simulation results are provided to illustrate the effectiveness of the proposed LNN-based Dyna framework for MBRL.

Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

TL;DR

Abstract

Paper Structure (9 sections, 19 equations, 3 figures, 3 algorithms)

This paper contains 9 sections, 19 equations, 3 figures, 3 algorithms.

Introduction
Integrating Lagrangian Neural Networks into the Dyna Framework
State Estimation-Based Learning in LNNs
Lagrangian mechanics
Stochastic gradient-based optimization of LNN weights
State estimation based optimization of LNN weights
LNN-Based Dyna Framework for MBRL
Simulation Results
Conclusion

Figures (3)

Figure 1: The basic idea behind the developed method of LNN-based MBRL in the Dyna framework. We learn the dynamics model using the LNN from real environment data and use an RK-2 integrator to generate model-based rollouts. The policy and value networks are trained on both real and model-generated data, improving sample efficiency while preserving physical consistency.
Figure 2: Model learning using LNN with $(q, \,\dot{q})$ as input and $L$ is the output of the network, which is further used to evaluate $\ddot{q}$ using the Euler-Lagrangian equation. The second-order Runge-Kutta (RK-2) method is then used to compute $(q_{t+1}, \, \dot{q}_{t+1})$. The reward is learning from the state-action pair.
Figure 3: Average return versus timestep plot for the proposed PIMBRL using LNN with Adam and EKF as optimizers, PIMBRL using DNN with constraints as in liu2021physics, and MFRL.

Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

TL;DR

Abstract

Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)