Decision Trees That Remember: Gradient-Based Learning of Recurrent Decision Trees with Memory
Sascha Marton, Moritz Schneider
TL;DR
This work addresses the challenge of modeling long-range temporal dependencies with interpretable decision structures by introducing ReMeDe Trees, a gradient-trained recurrent decision-tree architecture with an internal memory $M \in \mathbb{R}^{n_m}$. The model extends GradTree by incorporating memory into the input-state space $\tilde{X} = X \times M$, uses memory gating to update the hidden state, and learns both routing decisions and memory dynamics via backpropagation through time. Evaluation on five synthetic PoC tasks demonstrates perfect test accuracy and compact tree sizes, with ReMeDe Trees able to learn recurrent behavior and effectively manipulate internal memory, matching LSTM baselines in these scenarios. The approach promises a bridge between the interpretability of axis-aligned decision trees and the sequence modeling power of recurrent architectures, with potential for integration into ensembles and broader time-series applications.
Abstract
Neural architectures such as Recurrent Neural Networks (RNNs), Transformers, and State-Space Models have shown great success in handling sequential data by learning temporal dependencies. Decision Trees (DTs), on the other hand, remain a widely used class of models for structured tabular data but are typically not designed to capture sequential patterns directly. Instead, DT-based approaches for time-series data often rely on feature engineering, such as manually incorporating lag features, which can be suboptimal for capturing complex temporal dependencies. To address this limitation, we introduce ReMeDe Trees, a novel recurrent DT architecture that integrates an internal memory mechanism, similar to RNNs, to learn long-term dependencies in sequential data. Our model learns hard, axis-aligned decision rules for both output generation and state updates, optimizing them efficiently via gradient descent. We provide a proof-of-concept study on synthetic benchmarks to demonstrate the effectiveness of our approach.
