Table of Contents
Fetching ...

Putting the Iterative Training of Decision Trees to the Test on a Real-World Robotic Task

Raphael C. Engelhardt, Marcel J. Meinen, Moritz Lange, Laurenz Wiskott, Wolfgang Konen

TL;DR

Problem and approach: This work tests an iterative method to distill DTs from a DRL policy for a real-world robotic control task (CartPole Swing-Up). It uses an alternating loop where DTs guide exploration of the state space and a DRL agent supplies the correct actions, producing labeled samples for DT training; the real-world CPSU task introduces noise and delays not present in simulation. Key findings: after 10 iterations, the best DT achieved $\overline{R}=7594.87 \\pm 826.85$ on five evaluation episodes, closely matching the DRL oracle at $\overline{R}=7138.83 \\pm 1517.47$, and could be pruned to about 36% fewer parameters than the DQN; base sampling required careful curation to avoid bias toward upright states. Significance: the results demonstrate the feasibility of distilling transparent, lightweight DTs from DRL controllers for real-world reinforcement learning tasks.

Abstract

In previous research, we developed methods to train decision trees (DT) as agents for reinforcement learning tasks, based on deep reinforcement learning (DRL) networks. The samples from which the DTs are built, use the environment's state as features and the corresponding action as label. To solve the nontrivial task of selecting samples, which on one hand reflect the DRL agent's capabilities of choosing the right action but on the other hand also cover enough state space to generalize well, we developed an algorithm to iteratively train DTs. In this short paper, we apply this algorithm to a real-world implementation of a robotic task for the first time. Real-world tasks pose additional challenges compared to simulations, such as noise and delays. The task consists of a physical pendulum attached to a cart, which moves on a linear track. By movements to the left and to the right, the pendulum is to be swung in the upright position and balanced in the unstable equilibrium. Our results demonstrate the applicability of the algorithm to real-world tasks by generating a DT whose performance matches the performance of the DRL agent, while consisting of fewer parameters. This research could be a starting point for distilling DTs from DRL agents to obtain transparent, lightweight models for real-world reinforcement learning tasks.

Putting the Iterative Training of Decision Trees to the Test on a Real-World Robotic Task

TL;DR

Problem and approach: This work tests an iterative method to distill DTs from a DRL policy for a real-world robotic control task (CartPole Swing-Up). It uses an alternating loop where DTs guide exploration of the state space and a DRL agent supplies the correct actions, producing labeled samples for DT training; the real-world CPSU task introduces noise and delays not present in simulation. Key findings: after 10 iterations, the best DT achieved on five evaluation episodes, closely matching the DRL oracle at , and could be pruned to about 36% fewer parameters than the DQN; base sampling required careful curation to avoid bias toward upright states. Significance: the results demonstrate the feasibility of distilling transparent, lightweight DTs from DRL controllers for real-world reinforcement learning tasks.

Abstract

In previous research, we developed methods to train decision trees (DT) as agents for reinforcement learning tasks, based on deep reinforcement learning (DRL) networks. The samples from which the DTs are built, use the environment's state as features and the corresponding action as label. To solve the nontrivial task of selecting samples, which on one hand reflect the DRL agent's capabilities of choosing the right action but on the other hand also cover enough state space to generalize well, we developed an algorithm to iteratively train DTs. In this short paper, we apply this algorithm to a real-world implementation of a robotic task for the first time. Real-world tasks pose additional challenges compared to simulations, such as noise and delays. The task consists of a physical pendulum attached to a cart, which moves on a linear track. By movements to the left and to the right, the pendulum is to be swung in the upright position and balanced in the unstable equilibrium. Our results demonstrate the applicability of the algorithm to real-world tasks by generating a DT whose performance matches the performance of the DRL agent, while consisting of fewer parameters. This research could be a starting point for distilling DTs from DRL agents to obtain transparent, lightweight models for real-world reinforcement learning tasks.

Paper Structure

This paper contains 8 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Photograph of the real-world implementation of the CPSU task. Adapted from nayante
  • Figure 2: Histograms showing the performance of the $92$ episodes of the DQN agent. The dashed red vertical line marks the median.
  • Figure 3: Performance of DTs evolving with iterations. Each dot represents the average return of one DT in $n_e=5$ evaluation episodes. With dashed red, solid blue, and dash-dotted green lines respectively, the worst, median, and best out of the $N_T=10$ DTs are connected. The orange dotted line and shaded area mark the DQN's return of $\overline{R} = 7138.83 \pm 1517.47$ in the $100$ episodes from which the $92$ episodes for the base samples were selected.
  • Figure 4: Comparison between the performance of the DQN, the DT trained on plain samples (iteration $0$), and the DT trained with the iterative algorithm. Shown are boxplots for the returns of the $100$ episodes of the DQN, and the returns of the $n_e=5$ episodes of the best DT in iteration $0$ and $7$ respectively. Each dot marks the return of a single episode. Also shown are the six outlier DQN episodes that were discarded when compiling the base samples.