Table of Contents
Fetching ...

Maintaining Plasticity in Reinforcement Learning: A Cost-Aware Framework for Aerial Robot Control in Non-stationary Environments

Ali Tahir Karasahin, Ziniu Wu, Basaran Bahadir Kocer

TL;DR

This work tackles the problem of plasticity loss in reinforcement learning for aerial robots operating in non-stationary environments, notably variable winds. It introduces RECOM, a retrospective cost mechanism that dynamically adjusts the PPO learning rate by balancing recent rewards and losses, inspired by cognitive neuroscience. Empirical results in RotorPy show that RECOM combined with L2 regularization yields the best long-term performance, with a 90% success rate in hovering under wind disturbances and 11.29% fewer dormant units than a baseline L2-PPO setup. The proposed approach advances robust, long-horizon training for autonomous aerial control and offers a framework for further real-world validation and cross-task generalization.

Abstract

Reinforcement learning (RL) has demonstrated the ability to maintain the plasticity of the policy throughout short-term training in aerial robot control. However, these policies have been shown to loss of plasticity when extended to long-term learning in non-stationary environments. For example, the standard proximal policy optimization (PPO) policy is observed to collapse in long-term training settings and lead to significant control performance degradation. To address this problem, this work proposes a cost-aware framework that uses a retrospective cost mechanism (RECOM) to balance rewards and losses in RL training with a non-stationary environment. Using a cost gradient relation between rewards and losses, our framework dynamically updates the learning rate to actively train the control policy in a disturbed wind environment. Our experimental results show that our framework learned a policy for the hovering task without policy collapse in variable wind conditions and has a successful result of 11.29% less dormant units than L2 regularization with PPO.

Maintaining Plasticity in Reinforcement Learning: A Cost-Aware Framework for Aerial Robot Control in Non-stationary Environments

TL;DR

This work tackles the problem of plasticity loss in reinforcement learning for aerial robots operating in non-stationary environments, notably variable winds. It introduces RECOM, a retrospective cost mechanism that dynamically adjusts the PPO learning rate by balancing recent rewards and losses, inspired by cognitive neuroscience. Empirical results in RotorPy show that RECOM combined with L2 regularization yields the best long-term performance, with a 90% success rate in hovering under wind disturbances and 11.29% fewer dormant units than a baseline L2-PPO setup. The proposed approach advances robust, long-horizon training for autonomous aerial control and offers a framework for further real-world validation and cross-task generalization.

Abstract

Reinforcement learning (RL) has demonstrated the ability to maintain the plasticity of the policy throughout short-term training in aerial robot control. However, these policies have been shown to loss of plasticity when extended to long-term learning in non-stationary environments. For example, the standard proximal policy optimization (PPO) policy is observed to collapse in long-term training settings and lead to significant control performance degradation. To address this problem, this work proposes a cost-aware framework that uses a retrospective cost mechanism (RECOM) to balance rewards and losses in RL training with a non-stationary environment. Using a cost gradient relation between rewards and losses, our framework dynamically updates the learning rate to actively train the control policy in a disturbed wind environment. Our experimental results show that our framework learned a policy for the hovering task without policy collapse in variable wind conditions and has a successful result of 11.29% less dormant units than L2 regularization with PPO.

Paper Structure

This paper contains 16 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Example of aerial robot hovering from different initial positions under variable wind conditions.
  • Figure 2: Our RECOM framework uses loss and reward information during training. RECOM is designed to adapt to non-stationary environments in RL. It calculates a dynamically updated learning rate that balances rewards and losses.
  • Figure 3: Comparison of different reinforcement learning agents in training performance with wind disturbance.
  • Figure 4: Change of dormant units in the policy network during training under the wind disturbance.
  • Figure 5: Change of dormant units in the policy network during training without the wind disturbance.
  • ...and 2 more figures