Table of Contents
Fetching ...

Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning

Zakariae El Asri, Olivier Sigaud, Nicolas Thome

TL;DR

This work tackles the real-world RL triad of sample efficiency, inference time, and performance by introducing PhIHP, a physics-informed model with hybrid planning. It learns a dynamics model that combines a known physical prior with a neural residual, trains a policy and value function in imagination via TD3, and uses a CEM-based hybrid planner that blends policy guidance with model-based evaluation to speed up planning. The approach yields superior sample efficiency, competitive asymptotic performance, and reduced inference time across six ODE-governed control tasks with friction, outperforming strong baselines like TD-MPC and TD3. By demonstrating robust generalization from physics priors and highlighting the benefits of imagination over real-data learning, PhIHP offers a practical pathway toward real-time, data-efficient RL in domains where partial physical knowledge is available.

Abstract

Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating imaginary trajectories from this model to learn a model-free policy and Q-function. Furthermore, we propose a hybrid planning strategy, combining the learned policy and Q-function with the learned model to enhance time efficiency in planning. Through practical demonstrations, we illustrate that our method improves the compromise between sample efficiency, time efficiency, and performance over state-of-the-art methods.

Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning

TL;DR

This work tackles the real-world RL triad of sample efficiency, inference time, and performance by introducing PhIHP, a physics-informed model with hybrid planning. It learns a dynamics model that combines a known physical prior with a neural residual, trains a policy and value function in imagination via TD3, and uses a CEM-based hybrid planner that blends policy guidance with model-based evaluation to speed up planning. The approach yields superior sample efficiency, competitive asymptotic performance, and reduced inference time across six ODE-governed control tasks with friction, outperforming strong baselines like TD-MPC and TD3. By demonstrating robust generalization from physics priors and highlighting the benefits of imagination over real-data learning, PhIHP offers a practical pathway toward real-time, data-efficient RL in domains where partial physical knowledge is available.

Abstract

Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating imaginary trajectories from this model to learn a model-free policy and Q-function. Furthermore, we propose a hybrid planning strategy, combining the learned policy and Q-function with the learned model to enhance time efficiency in planning. Through practical demonstrations, we illustrate that our method improves the compromise between sample efficiency, time efficiency, and performance over state-of-the-art methods.
Paper Structure (26 sections, 7 equations, 14 figures, 6 tables)

This paper contains 26 sections, 7 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: PhIHP includes a Physics-Informed model and hybrid planning for efficient policy learning in RL. PhIHP improves the compromise over state-of-the-art methods, model-free TD3 and hybrid TD-MPC, between sample efficiency, time efficiency, and performance. Results averaged over 6 tasks towers_gymnasium_2023.
  • Figure 2: Schematic view of PhIHP. (a) We iteratively learn a physics-informed model from few interactions in the environment. (b) We learn a policy and Q-function from trajectories imagined with the learned model. (c) The agent samples actions from the policy output and random actions and then evaluates the resulting trajectories using CEM, a reward function, and the Q-function.
  • Figure 3: Comparison of PhIHP vs baselines aggregated on 6 control tasks (10 runs). a) PhIHP shows excellent sample efficiency and better asymptotic performance. b) Performance profiles are obtained with rliable agarwal2021deep. PhIHP shows better performance profiles which indicates a better robustness to outliers. Comparison on individual environments are shown in Appendix D .
  • Figure 4: Agregated median, interquartile median (IQM), mean performance, and optimality gap of PhIHP and baselines on 6 tasks (10 runs). Higher mean, median, and IQM performance and lower optimality gaps are better. Confidence intervals are estimated using the percentile bootstrap with stratified sampling agarwal2021deep. PhIHP outperforms baselines in all metrics.
  • Figure 5: Comparison of PhIHP and its variants on the 3 main metrics. The figures illustrate the aggregated results of running all algorithms on 6 classic control tasks. Histograms and bars represent mean and std. over 10 runs.
  • ...and 9 more figures