Solving Offline Reinforcement Learning with Decision Tree Regression

Prajwal Koirala; Cody Fleming

Solving Offline Reinforcement Learning with Decision Tree Regression

Prajwal Koirala, Cody Fleming

TL;DR

Two distinct frameworks are introduced: return-conditioned and return-weighted decision tree policies (RCDTP and RWDTP), both of which achieve notable speed in agent training as well as inference, with training typically lasting less than a few minutes.

Abstract

This study presents a novel approach to addressing offline reinforcement learning (RL) problems by reframing them as regression tasks that can be effectively solved using Decision Trees. Mainly, we introduce two distinct frameworks: return-conditioned and return-weighted decision tree policies (RCDTP and RWDTP), both of which achieve notable speed in agent training as well as inference, with training typically lasting less than a few minutes. Despite the simplification inherent in this reformulated approach to offline RL, our agents demonstrate performance that is at least on par with the established methods. We evaluate our methods on D4RL datasets for locomotion and manipulation, as well as other robotic tasks involving wheeled and flying robots. Additionally, we assess performance in delayed/sparse reward scenarios and highlight the explainability of these policies through action distribution and feature importance.

Solving Offline Reinforcement Learning with Decision Tree Regression

TL;DR

Abstract

Paper Structure (31 sections, 11 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 31 sections, 11 equations, 10 figures, 9 tables, 1 algorithm.

Introduction
Preliminaries and Related Works
Offline Reinforcement Learning
Decision Tree Regression
Related Works
RWDTP:
RCDTP:
RWDTP and RCDTP frameworks
RWDTP:
RCDTP:
Optimal Policy:
Policy Training Implementation:
Experimental Results
Gym Mujoco Locomotion
Gym Robot Manipulation - Adroit and Kitchen Tasks
...and 16 more sections

Figures (10)

Figure 1: Decision Tree Policies in Classical Control Environments with Different Action Spaces. Both methods achieve expert-level returns in both environments using medium-level demonstration datasets from d3rlpy seno2022d3rlpy, with all trainings completed within a second.
Figure 2: Decision Tree Policies Applied to Wheeled and Flying Robots for Assessment of Zero-Shot Transfer of the Learned Policy. (a) Return-conditioning in Different F1tenth Racetracks. (b) Goal-conditioning for Different Heights Using RWDTP in the Pybullet Drones Simulation.
Figure 3: Comparison Between Decision Tree Policies in Hopper Expert Dataset
Figure 4: Impact of Hyperparameter pp on Normalized Returns During Evaluation
Figure 5: Returns Distributions Comparison Between RCDTP and Decision Transformer
...and 5 more figures

Solving Offline Reinforcement Learning with Decision Tree Regression

TL;DR

Abstract

Solving Offline Reinforcement Learning with Decision Tree Regression

Authors

TL;DR

Abstract

Table of Contents

Figures (10)