Table of Contents
Fetching ...

Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

Donghe Chen, Yubin Peng, Tengjie Zheng, Han Wang, Chaoran Qu, Lin Cheng

TL;DR

This work introduces Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.

Abstract

High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These issues are exacerbated when the task requires the agent to achieve a precise goal state, as is common in robotics and other real-world applications.We introduce Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.Finally, through benchmark tests, AAC outperformed standard RL algorithms in precision-critical, goal-conditioned tasks, demonstrating AAC's high precision, reliability, and robustness.Code are available at: https://anonymous.4open.science/r/Adviser-Actor-Critic-8AC5.

Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

TL;DR

This work introduces Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.

Abstract

High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These issues are exacerbated when the task requires the agent to achieve a precise goal state, as is common in robotics and other real-world applications.We introduce Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.Finally, through benchmark tests, AAC outperformed standard RL algorithms in precision-critical, goal-conditioned tasks, demonstrating AAC's high precision, reliability, and robustness.Code are available at: https://anonymous.4open.science/r/Adviser-Actor-Critic-8AC5.

Paper Structure

This paper contains 19 sections, 22 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: When a robot aims directly for a desired goal, control inaccuracies or model bias can prevent it from reaching the target precisely. However, by guiding the robot toward a strategically placed "fake goal," set by an adviser, it can effectively achieve the desired position.
  • Figure 2: Interact With Environment: The Adviser outputs a synthetic error $\varepsilon$ to guide the Actor Neural Network's decision-making, alongside the achieved goal $g_a$ and current observation $s$. This framework integrates deep learning with classical control theory, enabling adaptive policy optimization and enhancing performance in dynamic environments. Learn from Experience: The Critic Neural Network estimates the state-action value function $Q_{\theta}(s_e, a)$, while the Actor Neural Network generates the policy $\pi_{\phi}(\cdot|s_e)$ based on the augmented observation $s_e$. Experiences are stored as tuples $(s_e, a, r, s_e')$ in an Experience Buffer to facilitate continuous learning. Feedback from the environment includes rewards and new observations.
  • Figure 3: Within the dynamic system, PID controllers transform the actual error $e$ into a fake error $\varepsilon$. For simplified stability analysis, the PID controller parameters and dynamic system parameters are combined, adjusting $K_p' = K_p - a_0$ and $K_d' = K_d - a_1$. $\varepsilon_i$ represents the adjusted error, while $d_i$ denotes disturbances in the $i$-th dimension, accounting for unmodeled dynamics. Notably, when $K_p = 1$, $K_i = 0$, and $K_d = 0$, the PID controller exhibits no additional effect, behaving as if no PID controller were present.
  • Figure 4: Response of a Second-Order System to Command under Different PID Parameters: Analyzing Asymptotic, Marginal, and Instability Conditions
  • Figure 5: Different environments: Position Control of a Mass-Spring-Damper System (left), Robotics Arm Fetch (middle) and Velocity Control of a Quadcopter (right).
  • ...and 4 more figures