Deep Reinforcement Learning Control for Disturbance Rejection in a Nonlinear Dynamic System with Parametric Uncertainty
Vincent W. Hill
TL;DR
This paper tackles disturbance rejection for a nonlinear, uncertain flexible inverted pendulum on a cart (FIPWC) using Deep Deterministic Policy Gradient (DDPG) with an actor-critic network to learn a continuous force policy. Disturbances are modeled as Ornstein–Uhlenbeck processes and the policy is trained in a Gym/Keras-RL environment by maximizing the expected return $J = \mathbb{E}[r_1]$ with the reward $r$ penalizing state deviation and control effort: $r = -\Delta t ( \sum_{i} w_i (x_i - x_{i,des})^2 + 0.1 F^2 )$. The FIPWC dynamics are derived from a Lagrangian, include parametric uncertainty in stiffness and damping, and are integrated with a fourth-order Runge-Kutta method; DRL results are benchmarked against a PD controller via 10k Monte Carlo trials. The findings indicate DRL provides superior disturbance rejection and tighter state convergence in the presence of stochastic disturbances, suggesting practical potential for robust control in uncertain nonlinear systems, though improvements in actuator realism and estimation could further enhance applicability.
Abstract
This work describes a technique for active rejection of multiple independent and time-correlated stochastic disturbances for a nonlinear flexible inverted pendulum with cart system with uncertain model parameters. The control law is determined through deep reinforcement learning, specifically with a continuous actor-critic variant of deep Q-learning known as Deep Deterministic Policy Gradient, while the disturbance magnitudes evolve via independent stochastic processes. Simulation results are then compared with those from a classical control system.
