A Method for Fast Autonomy Transfer in Reinforcement Learning
Dinuka Sahabandu, Bhaskar Ramasubramanian, Michail Alexiou, J. Sukarno Mertoguno, Linda Bushnell, Radha Poovendran
TL;DR
The paper addresses rapid autonomy transfer in reinforcement learning by reusing pre-trained critic value functions from multiple environments. It introduces the Multi-Critic Actor-Critic (MCAC) algorithm, which forms a weighted ensemble of $N$ pre-trained critics to approximate the current environment's value function via $\hat{V}(s)=\sum_{i=1}^N w_i\bar{V}_i(s)$ with weights on the probability simplex. Weights are updated on a faster time-scale using a TD-error-driven rule, while the actor's policy updates occur on a slower time-scale, enabling stable convergence. Empirical results on two grid-world case studies show MCAC achieves up to $22.76\times$ faster autonomy transfer and higher rewards than a baseline actor-critic, highlighting the practical impact of cross-environment knowledge transfer in RL.
Abstract
This paper introduces a novel reinforcement learning (RL) strategy designed to facilitate rapid autonomy transfer by utilizing pre-trained critic value functions from multiple environments. Unlike traditional methods that require extensive retraining or fine-tuning, our approach integrates existing knowledge, enabling an RL agent to adapt swiftly to new settings without requiring extensive computational resources. Our contributions include development of the Multi-Critic Actor-Critic (MCAC) algorithm, establishing its convergence, and empirical evidence demonstrating its efficacy. Our experimental results show that MCAC significantly outperforms the baseline actor-critic algorithm, achieving up to 22.76x faster autonomy transfer and higher reward accumulation. This advancement underscores the potential of leveraging accumulated knowledge for efficient adaptation in RL applications.
