Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning
Zakariae El Asri, Olivier Sigaud, Nicolas Thome
TL;DR
This work tackles the real-world RL triad of sample efficiency, inference time, and performance by introducing PhIHP, a physics-informed model with hybrid planning. It learns a dynamics model that combines a known physical prior with a neural residual, trains a policy and value function in imagination via TD3, and uses a CEM-based hybrid planner that blends policy guidance with model-based evaluation to speed up planning. The approach yields superior sample efficiency, competitive asymptotic performance, and reduced inference time across six ODE-governed control tasks with friction, outperforming strong baselines like TD-MPC and TD3. By demonstrating robust generalization from physics priors and highlighting the benefits of imagination over real-data learning, PhIHP offers a practical pathway toward real-time, data-efficient RL in domains where partial physical knowledge is available.
Abstract
Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating imaginary trajectories from this model to learn a model-free policy and Q-function. Furthermore, we propose a hybrid planning strategy, combining the learned policy and Q-function with the learned model to enhance time efficiency in planning. Through practical demonstrations, we illustrate that our method improves the compromise between sample efficiency, time efficiency, and performance over state-of-the-art methods.
