Maintaining Plasticity in Reinforcement Learning: A Cost-Aware Framework for Aerial Robot Control in Non-stationary Environments
Ali Tahir Karasahin, Ziniu Wu, Basaran Bahadir Kocer
TL;DR
This work tackles the problem of plasticity loss in reinforcement learning for aerial robots operating in non-stationary environments, notably variable winds. It introduces RECOM, a retrospective cost mechanism that dynamically adjusts the PPO learning rate by balancing recent rewards and losses, inspired by cognitive neuroscience. Empirical results in RotorPy show that RECOM combined with L2 regularization yields the best long-term performance, with a 90% success rate in hovering under wind disturbances and 11.29% fewer dormant units than a baseline L2-PPO setup. The proposed approach advances robust, long-horizon training for autonomous aerial control and offers a framework for further real-world validation and cross-task generalization.
Abstract
Reinforcement learning (RL) has demonstrated the ability to maintain the plasticity of the policy throughout short-term training in aerial robot control. However, these policies have been shown to loss of plasticity when extended to long-term learning in non-stationary environments. For example, the standard proximal policy optimization (PPO) policy is observed to collapse in long-term training settings and lead to significant control performance degradation. To address this problem, this work proposes a cost-aware framework that uses a retrospective cost mechanism (RECOM) to balance rewards and losses in RL training with a non-stationary environment. Using a cost gradient relation between rewards and losses, our framework dynamically updates the learning rate to actively train the control policy in a disturbed wind environment. Our experimental results show that our framework learned a policy for the hovering task without policy collapse in variable wind conditions and has a successful result of 11.29% less dormant units than L2 regularization with PPO.
