Table of Contents
Fetching ...

Velocity-History-Based Soft Actor-Critic Tackling IROS'24 Competition "AI Olympics with RealAIGym"

Tim Lukas Faust, Habib Maraqten, Erfan Aghadavoodi, Boris Belousov, Jan Peters

TL;DR

A novel solution submitted to IROS'24 competition, which builds upon Soft Actor-Critic (SAC), a popular model-free entropy-regularized Reinforcement Learning (RL) algorithm, which adds a `context' vector to the state, which encodes the immediate history via a Convolutional Neural Network to counteract the unmodeled effects on the real system.

Abstract

The ``AI Olympics with RealAIGym'' competition challenges participants to stabilize chaotic underactuated dynamical systems with advanced control algorithms. In this paper, we present a novel solution submitted to IROS'24 competition, which builds upon Soft Actor-Critic (SAC), a popular model-free entropy-regularized Reinforcement Learning (RL) algorithm. We add a `context' vector to the state, which encodes the immediate history via a Convolutional Neural Network (CNN) to counteract the unmodeled effects on the real system. Our method achieves high performance scores and competitive robustness scores on both tracks of the competition: Pendubot and Acrobot.

Velocity-History-Based Soft Actor-Critic Tackling IROS'24 Competition "AI Olympics with RealAIGym"

TL;DR

A novel solution submitted to IROS'24 competition, which builds upon Soft Actor-Critic (SAC), a popular model-free entropy-regularized Reinforcement Learning (RL) algorithm, which adds a `context' vector to the state, which encodes the immediate history via a Convolutional Neural Network to counteract the unmodeled effects on the real system.

Abstract

The ``AI Olympics with RealAIGym'' competition challenges participants to stabilize chaotic underactuated dynamical systems with advanced control algorithms. In this paper, we present a novel solution submitted to IROS'24 competition, which builds upon Soft Actor-Critic (SAC), a popular model-free entropy-regularized Reinforcement Learning (RL) algorithm. We add a `context' vector to the state, which encodes the immediate history via a Convolutional Neural Network (CNN) to counteract the unmodeled effects on the real system. Our method achieves high performance scores and competitive robustness scores on both tracks of the competition: Pendubot and Acrobot.

Paper Structure

This paper contains 14 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Our model architecture for encoding the history into a context representation. A sequence of past velocity measurements is passed through convolutional and fully-connected layers, and the output is attached to the current measurement before being passed to the actor and critic in SAC.
  • Figure 2: Successful swing-up on a simulated Pendubot system.
  • Figure 3: Robustness Metrics for Pendubot Controller.