Table of Contents
Fetching ...

Constant in an Ever-Changing World

Andy Wu, Chun-Cheng Lin, Yuehua Huang, Rung-Tzuo Liaw

TL;DR

The paper tackles the prevalent instability in reinforcement learning training, particularly in Actor-Critic methods, by proposing Constant in an Ever-Changing World (CIC), a two-actor framework that stabilizes critic updates. CIC maintains a stable representative policy (actor1) and a trainable current policy (actor2), using a mixing mechanism controlled by an adaptive coefficient λ to combine their contributions during critic updates, thereby breaking negative feedback loops without adding computational overhead. An adaptive λ mechanism uses a Lambda buffer to adjust λ over time, ensuring robust performance across environments and training phases. Empirical evaluation on five MuJoCo continuous-control tasks shows CIC variants improving performance and reducing instability relative to baseline methods, with adaptive λ yielding the best results and the approach remaining simple to integrate into existing Actor-Critic algorithms.

Abstract

The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost.

Constant in an Ever-Changing World

TL;DR

The paper tackles the prevalent instability in reinforcement learning training, particularly in Actor-Critic methods, by proposing Constant in an Ever-Changing World (CIC), a two-actor framework that stabilizes critic updates. CIC maintains a stable representative policy (actor1) and a trainable current policy (actor2), using a mixing mechanism controlled by an adaptive coefficient λ to combine their contributions during critic updates, thereby breaking negative feedback loops without adding computational overhead. An adaptive λ mechanism uses a Lambda buffer to adjust λ over time, ensuring robust performance across environments and training phases. Empirical evaluation on five MuJoCo continuous-control tasks shows CIC variants improving performance and reducing instability relative to baseline methods, with adaptive λ yielding the best results and the approach remaining simple to integrate into existing Actor-Critic algorithms.

Abstract

The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost.

Paper Structure

This paper contains 12 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Actor-Critic 演算法的不穩定性
  • Figure 2: CIC 架構圖
  • Figure 3: CIC 對比實驗
  • Figure 4: CIC-TD3 固定$\lambda$分析實驗
  • Figure 5: CIC $\lambda$ 隨訓練過程自適應調整變化