Table of Contents
Fetching ...

Control Synthesis with Reinforcement Learning: A Modeling Perspective

Nikki Xu, Hien Tran

TL;DR

This work investigates the sim-to-real gap in reinforcement-learning controllers for a cart-pole system by comparing policies trained on a high-fidelity lab dynamics model versus a simplified linear model. It demonstrates that high-fidelity training yields robust, disturbance-tolerant stabilization directly transferable to hardware, while training on an inaccurate model leads to brittle performance in the physical arena. The authors introduce a fast, accurate complex-step local sensitivity analysis and an empirical region-of-attraction estimation to quantify robustness and safety margins under model mismatch. They discuss limitations such as distribution shift and propose future directions including safety constraints and symmetry-aware network design for more reliable deployable RL controllers.

Abstract

Controllers designed with reinforcement learning can be sensitive to model mismatch. We demonstrate that designing such controllers in a virtual simulation environment with an inaccurate model is not suitable for deployment in a physical setup. Controllers designed using an accurate model is robust against disturbance and small mismatch between the physical setup and the mathematical model derived from first principles; while a poor model results in a controller that performs well in simulation but fails in physical experiments. Sensitivity analysis is used to justify these discrepancies and an empirical region of attraction estimation help us visualize their robustness.

Control Synthesis with Reinforcement Learning: A Modeling Perspective

TL;DR

This work investigates the sim-to-real gap in reinforcement-learning controllers for a cart-pole system by comparing policies trained on a high-fidelity lab dynamics model versus a simplified linear model. It demonstrates that high-fidelity training yields robust, disturbance-tolerant stabilization directly transferable to hardware, while training on an inaccurate model leads to brittle performance in the physical arena. The authors introduce a fast, accurate complex-step local sensitivity analysis and an empirical region-of-attraction estimation to quantify robustness and safety margins under model mismatch. They discuss limitations such as distribution shift and propose future directions including safety constraints and symmetry-aware network design for more reliable deployable RL controllers.

Abstract

Controllers designed with reinforcement learning can be sensitive to model mismatch. We demonstrate that designing such controllers in a virtual simulation environment with an inaccurate model is not suitable for deployment in a physical setup. Controllers designed using an accurate model is robust against disturbance and small mismatch between the physical setup and the mathematical model derived from first principles; while a poor model results in a controller that performs well in simulation but fails in physical experiments. Sensitivity analysis is used to justify these discrepancies and an empirical region of attraction estimation help us visualize their robustness.

Paper Structure

This paper contains 19 sections, 19 equations, 27 figures, 2 tables, 1 algorithm.

Figures (27)

  • Figure 1: Sketch of Pendulum on Cart
  • Figure 2: Architecture of the Neural Network Parameterizing a Normal Distribution for Policy. Inputs are state variables, outputs are $\mu$ and $\sigma$, standing for the mean and variance, respectively, of the normal distribution from which the actions are sampled from at each time step.
  • Figure 3: Control Learned with Different Systems
  • Figure 4: Control Learned with LTI Systems Simulated with Difference Systems
  • Figure 5: Controller Learned with LTI Model Generally Fails in Lab Experiment
  • ...and 22 more figures