Table of Contents
Fetching ...

Learning from Less: SINDy Surrogates in RL

Aniket Dixit, Muhammad Ibrahim Khan, Faizan Ahmed, James Brusey

TL;DR

This work introduces SINDy-based surrogate environments to reduce data requirements in reinforcement learning, demonstrating that sparse, interpretable dynamical models can replace heavy physics engines while preserving learning outcomes. By collecting a small number of transitions with pre-trained agents and fitting SINDy models, the authors construct surrogate environments where $s_{t+1}=f_{ ext{SINDy}}(s_t,a_t)$, enabling SD-RL with substantially fewer training steps. The approach achieves correlations well above $0.997$ and $MSE$ near $10^{-6}$ on Mountain Car and Lunar Lander, with notable data efficiency (75 and 1{,}000 transitions) and significant computational savings (~$35 ext{-}20 ext{ }\%$). The results highlight a practical, interpretable avenue for rapid, safe policy development in resource-constrained or safety-critical domains, while suggesting avenues for scalability and real-world testing.

Abstract

This paper introduces an approach for developing surrogate environments in reinforcement learning (RL) using the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm. We demonstrate the effectiveness of our approach through extensive experiments in OpenAI Gym environments, particularly Mountain Car and Lunar Lander. Our results show that SINDy-based surrogate models can accurately capture the underlying dynamics of these environments while reducing computational costs by 20-35%. With only 75 interactions for Mountain Car and 1000 for Lunar Lander, we achieve state-wise correlations exceeding 0.997, with mean squared errors as low as 3.11e-06 for Mountain Car velocity and 1.42e-06 for LunarLander position. RL agents trained in these surrogate environments require fewer total steps (65,075 vs. 100,000 for Mountain Car and 801,000 vs. 1,000,000 for Lunar Lander) while achieving comparable performance to those trained in the original environments, exhibiting similar convergence patterns and final performance metrics. This work contributes to the field of model-based RL by providing an efficient method for generating accurate, interpretable surrogate environments.

Learning from Less: SINDy Surrogates in RL

TL;DR

This work introduces SINDy-based surrogate environments to reduce data requirements in reinforcement learning, demonstrating that sparse, interpretable dynamical models can replace heavy physics engines while preserving learning outcomes. By collecting a small number of transitions with pre-trained agents and fitting SINDy models, the authors construct surrogate environments where , enabling SD-RL with substantially fewer training steps. The approach achieves correlations well above and near on Mountain Car and Lunar Lander, with notable data efficiency (75 and 1{,}000 transitions) and significant computational savings (~). The results highlight a practical, interpretable avenue for rapid, safe policy development in resource-constrained or safety-critical domains, while suggesting avenues for scalability and real-world testing.

Abstract

This paper introduces an approach for developing surrogate environments in reinforcement learning (RL) using the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm. We demonstrate the effectiveness of our approach through extensive experiments in OpenAI Gym environments, particularly Mountain Car and Lunar Lander. Our results show that SINDy-based surrogate models can accurately capture the underlying dynamics of these environments while reducing computational costs by 20-35%. With only 75 interactions for Mountain Car and 1000 for Lunar Lander, we achieve state-wise correlations exceeding 0.997, with mean squared errors as low as 3.11e-06 for Mountain Car velocity and 1.42e-06 for LunarLander position. RL agents trained in these surrogate environments require fewer total steps (65,075 vs. 100,000 for Mountain Car and 801,000 vs. 1,000,000 for Lunar Lander) while achieving comparable performance to those trained in the original environments, exhibiting similar convergence patterns and final performance metrics. This work contributes to the field of model-based RL by providing an efficient method for generating accurate, interpretable surrogate environments.

Paper Structure

This paper contains 10 sections, 1 equation, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Mountain Car policy comparison showing remarkably similar force application strategies. Both policies exhibit identical momentum building (blue) and goal targeting (red) regions.
  • Figure 2: Lunar Lander policy comparison showing consistent control strategies across both environments.