Control Synthesis with Reinforcement Learning: A Modeling Perspective
Nikki Xu, Hien Tran
TL;DR
This work investigates the sim-to-real gap in reinforcement-learning controllers for a cart-pole system by comparing policies trained on a high-fidelity lab dynamics model versus a simplified linear model. It demonstrates that high-fidelity training yields robust, disturbance-tolerant stabilization directly transferable to hardware, while training on an inaccurate model leads to brittle performance in the physical arena. The authors introduce a fast, accurate complex-step local sensitivity analysis and an empirical region-of-attraction estimation to quantify robustness and safety margins under model mismatch. They discuss limitations such as distribution shift and propose future directions including safety constraints and symmetry-aware network design for more reliable deployable RL controllers.
Abstract
Controllers designed with reinforcement learning can be sensitive to model mismatch. We demonstrate that designing such controllers in a virtual simulation environment with an inaccurate model is not suitable for deployment in a physical setup. Controllers designed using an accurate model is robust against disturbance and small mismatch between the physical setup and the mathematical model derived from first principles; while a poor model results in a controller that performs well in simulation but fails in physical experiments. Sensitivity analysis is used to justify these discrepancies and an empirical region of attraction estimation help us visualize their robustness.
