Table of Contents
Fetching ...

Actor-Critic Pretraining for Proximal Policy Optimization

Andreas Kernbach, Amr Elsheikh, Nicolas Grupp, René Nagel, Marco F. Huber

TL;DR

A pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks, which shows that actor-critic pretraining improves sample efficiency by 86.1% on average compared to no pretraining and by 30.9% to actor-only pretraining.

Abstract

Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A common approach is actor pretraining, where the actor network is initialized via behavioral cloning on expert demonstrations and subsequently fine-tuned with RL. In contrast, the initialization of the critic network has received little attention, despite its central role in policy optimization. This paper proposes a pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. The approach is evaluated on 15 simulated robotic manipulation and locomotion tasks. Experimental results show that actor-critic pretraining improves sample efficiency by 86.1% on average compared to no pretraining and by 30.9% to actor-only pretraining.

Actor-Critic Pretraining for Proximal Policy Optimization

TL;DR

A pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks, which shows that actor-critic pretraining improves sample efficiency by 86.1% on average compared to no pretraining and by 30.9% to actor-only pretraining.

Abstract

Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A common approach is actor pretraining, where the actor network is initialized via behavioral cloning on expert demonstrations and subsequently fine-tuned with RL. In contrast, the initialization of the critic network has received little attention, despite its central role in policy optimization. This paper proposes a pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. The approach is evaluated on 15 simulated robotic manipulation and locomotion tasks. Experimental results show that actor-critic pretraining improves sample efficiency by 86.1% on average compared to no pretraining and by 30.9% to actor-only pretraining.
Paper Structure (13 sections, 9 equations, 3 figures, 1 table)

This paper contains 13 sections, 9 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Visualization of the pretraining and fine-tuning approach using expert data with an actor-critic model. First an expert policy is used to pretrain actor and critic, which are subsequently fine-tuned using PPO.
  • Figure 2: Comparison of the learning results in PPO fine-tuning given ACP, AP and NP in three exemplary environments. Returns are averaged over three random seeds for fine-tuning, with shaded regions indicating the standard deviation.
  • Figure 3: Evaluation of the amount of expert and rollout steps.