Table of Contents
Fetching ...

Steady-State Error Compensation for Reinforcement Learning with Quadratic Rewards

Liyao Wang, Zishun Zheng, Yuan Lin

TL;DR

The paper addresses steady-state error in reinforcement learning when quadratic rewards are used. It proposes incorporating an integral term into the quadratic reward to capture reward history and compensate for steady-state error, while preserving the smoothness of quadratic rewards. Two cumulative-error schemes are proposed, enabling fast convergence and robust steady-state correction without modifying network architecture. Empirical validation on ACC and lane-change tasks demonstrates reduced steady-state error and controlled state fluctuations, indicating practical applicability for vehicle control problems.

Abstract

The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of significant steady-state errors often manifest when quadratic reward functions are employed. Although absolute-value-type reward functions alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history, and consequently alleviates concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) and lane change models, we validate that the proposed method effectively diminishes steady-state errors and does not cause significant spikes in some system states.

Steady-State Error Compensation for Reinforcement Learning with Quadratic Rewards

TL;DR

The paper addresses steady-state error in reinforcement learning when quadratic rewards are used. It proposes incorporating an integral term into the quadratic reward to capture reward history and compensate for steady-state error, while preserving the smoothness of quadratic rewards. Two cumulative-error schemes are proposed, enabling fast convergence and robust steady-state correction without modifying network architecture. Empirical validation on ACC and lane-change tasks demonstrates reduced steady-state error and controlled state fluctuations, indicating practical applicability for vehicle control problems.

Abstract

The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of significant steady-state errors often manifest when quadratic reward functions are employed. Although absolute-value-type reward functions alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history, and consequently alleviates concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) and lane change models, we validate that the proposed method effectively diminishes steady-state errors and does not cause significant spikes in some system states.
Paper Structure (19 sections, 22 equations, 6 figures, 4 tables)

This paper contains 19 sections, 22 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Schematic for two-car following.
  • Figure 2: Spacing errors in the ACC model. (Only the first 400 timesteps are shown, and the actual training is 600 timesteps)
  • Figure 3: The rate of change in vehicle acceleration for ACC. (Only the first 400 timesteps are shown, and the actual training is 600 timesteps)
  • Figure 4: Dynamic bicycle model for lane changege2021numerically.
  • Figure 5: The distance between the car and the centerline in lane change. (Only the first 100 timesteps are shown, and the actual training is 150 timesteps)
  • ...and 1 more figures