Table of Contents
Fetching ...

Upside-Down Reinforcement Learning for More Interpretable Optimal Control

Juan Cardenas-Cartagena, Massimiliano Falzari, Marco Zullich, Matthia Sabatelli

TL;DR

This paper investigates whether function approximation algorithms other than NNs can also be used within a UDRL framework and shows that tree-based methods like Random Forests and Extremely Randomized Trees can perform just as well as NNs with the significant benefit of resulting in policies that are inherently more interpretable than NNs.

Abstract

Model-Free Reinforcement Learning (RL) algorithms either learn how to map states to expected rewards or search for policies that can maximize a certain performance function. Model-Based algorithms instead, aim to learn an approximation of the underlying model of the RL environment and then use it in combination with planning algorithms. Upside-Down Reinforcement Learning (UDRL) is a novel learning paradigm that aims to learn how to predict actions from states and desired commands. This task is formulated as a Supervised Learning problem and has successfully been tackled by Neural Networks (NNs). In this paper, we investigate whether function approximation algorithms other than NNs can also be used within a UDRL framework. Our experiments, performed over several popular optimal control benchmarks, show that tree-based methods like Random Forests and Extremely Randomized Trees can perform just as well as NNs with the significant benefit of resulting in policies that are inherently more interpretable than NNs, therefore paving the way for more transparent, safe, and robust RL.

Upside-Down Reinforcement Learning for More Interpretable Optimal Control

TL;DR

This paper investigates whether function approximation algorithms other than NNs can also be used within a UDRL framework and shows that tree-based methods like Random Forests and Extremely Randomized Trees can perform just as well as NNs with the significant benefit of resulting in policies that are inherently more interpretable than NNs.

Abstract

Model-Free Reinforcement Learning (RL) algorithms either learn how to map states to expected rewards or search for policies that can maximize a certain performance function. Model-Based algorithms instead, aim to learn an approximation of the underlying model of the RL environment and then use it in combination with planning algorithms. Upside-Down Reinforcement Learning (UDRL) is a novel learning paradigm that aims to learn how to predict actions from states and desired commands. This task is formulated as a Supervised Learning problem and has successfully been tackled by Neural Networks (NNs). In this paper, we investigate whether function approximation algorithms other than NNs can also be used within a UDRL framework. Our experiments, performed over several popular optimal control benchmarks, show that tree-based methods like Random Forests and Extremely Randomized Trees can perform just as well as NNs with the significant benefit of resulting in policies that are inherently more interpretable than NNs, therefore paving the way for more transparent, safe, and robust RL.

Paper Structure

This paper contains 17 sections, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: A simple MDP whose behavior function $f$ is summarized in Table \ref{['tab:behavior_function']}.
  • Figure 2: Comparison of the performance of the six different tested behavior functions (NN, RF, ET, KNN, AdaBoost, and XGBoost) on the three OpenAI Gym environments: CartPole, Acrobot, and Lunar Lander. The results are shown in terms of rewards per episode and are averaged over five different training runs.
  • Figure 3: Feature importance scores coming from a trained RF behavior function computed for three different states of the CartPole environment.
  • Figure 4: Feature importance scores coming from a trained ET behavior function computed for three different states of the Acrobot environment.
  • Figure 5: Feature importance scores coming from a trained RF behavior function computed for three different states of the Lunar Lander environment.