Constant in an Ever-Changing World
Andy Wu, Chun-Cheng Lin, Yuehua Huang, Rung-Tzuo Liaw
TL;DR
The paper tackles the prevalent instability in reinforcement learning training, particularly in Actor-Critic methods, by proposing Constant in an Ever-Changing World (CIC), a two-actor framework that stabilizes critic updates. CIC maintains a stable representative policy (actor1) and a trainable current policy (actor2), using a mixing mechanism controlled by an adaptive coefficient λ to combine their contributions during critic updates, thereby breaking negative feedback loops without adding computational overhead. An adaptive λ mechanism uses a Lambda buffer to adjust λ over time, ensuring robust performance across environments and training phases. Empirical evaluation on five MuJoCo continuous-control tasks shows CIC variants improving performance and reducing instability relative to baseline methods, with adaptive λ yielding the best results and the approach remaining simple to integrate into existing Actor-Critic algorithms.
Abstract
The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost.
