Table of Contents
Fetching ...

Experiments with Optimal Model Trees

Sabino Francesco Roselli, Eibe Frank

TL;DR

The paper addresses learning globally optimal model trees with linear leaf models using MILP, targeting both classification and regression. It introduces univariate and multivariate MILP formulations (ORMT, OCMT, and their hybrids) and extends to multi‑class leaves, with SVMs guiding leaf predictions. Empirical results show MILP‑based optimal model trees can match or exceed greedy and some optimal baselines in accuracy while remaining compact, highlighting interpretability as a key benefit. However, scalability is a major limitation, as larger trees frequently hit time limits, underscoring the method's suitability for small, highly interpretable models in settings where accuracy and transparency are critical.

Abstract

Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values in their leaves, model trees can use linear combinations of predictor variables in their leaf nodes to form predictions, which can help achieve higher accuracy and smaller trees. Typical algorithms for learning model trees from training data work in a greedy fashion, growing the tree in a top-down manner by recursively splitting the data into smaller and smaller subsets. Crucially, the selected splits are only locally optimal, potentially rendering the tree overly complex and less accurate than a tree whose structure is globally optimal for the training data. In this paper, we empirically investigate the effect of constructing globally optimal model trees for classification and regression with linear support vector machines at the leaf nodes. To this end, we present mixed-integer linear programming formulations to learn optimal trees, compute such trees for a large collection of benchmark data sets, and compare their performance against greedily grown model trees in terms of interpretability and accuracy. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines. Our results show that optimal model trees can achieve competitive accuracy with very small trees. We also investigate the effect on the accuracy of replacing axis-parallel splits with multivariate ones, foregoing interpretability while potentially obtaining greater accuracy.

Experiments with Optimal Model Trees

TL;DR

The paper addresses learning globally optimal model trees with linear leaf models using MILP, targeting both classification and regression. It introduces univariate and multivariate MILP formulations (ORMT, OCMT, and their hybrids) and extends to multi‑class leaves, with SVMs guiding leaf predictions. Empirical results show MILP‑based optimal model trees can match or exceed greedy and some optimal baselines in accuracy while remaining compact, highlighting interpretability as a key benefit. However, scalability is a major limitation, as larger trees frequently hit time limits, underscoring the method's suitability for small, highly interpretable models in settings where accuracy and transparency are critical.

Abstract

Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values in their leaves, model trees can use linear combinations of predictor variables in their leaf nodes to form predictions, which can help achieve higher accuracy and smaller trees. Typical algorithms for learning model trees from training data work in a greedy fashion, growing the tree in a top-down manner by recursively splitting the data into smaller and smaller subsets. Crucially, the selected splits are only locally optimal, potentially rendering the tree overly complex and less accurate than a tree whose structure is globally optimal for the training data. In this paper, we empirically investigate the effect of constructing globally optimal model trees for classification and regression with linear support vector machines at the leaf nodes. To this end, we present mixed-integer linear programming formulations to learn optimal trees, compute such trees for a large collection of benchmark data sets, and compare their performance against greedily grown model trees in terms of interpretability and accuracy. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines. Our results show that optimal model trees can achieve competitive accuracy with very small trees. We also investigate the effect on the accuracy of replacing axis-parallel splits with multivariate ones, foregoing interpretability while potentially obtaining greater accuracy.

Paper Structure

This paper contains 12 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Connection between the variables $d_t$ for a perfect tree of $\textrm{$\mathcal{D}$}\xspace = 3$ (a) and the corresponding decision tree (b)
  • Figure 2: Average accuracy and corresponding standard deviation over 30 runs for each classification data set when comparing the glass box decision trees.
  • Figure 3: Average accuracy and corresponding standard deviation over 30 runs for each classification data set when comparing univariate and multivariate MILP-grown classification trees with and without SVMs in the leaf nodes against LMT, CART, RF, SVM, and LS-OMS.
  • Figure 4: Average RAE and corresponding standard deviation over 30 runs for each regression data set when comparing univariate and multivariate MILP-grown regression trees with and without SVMs in the leaf nodes against M5P, CART, RF, SVM, SRT-L, and LS-OMS. "-" means that no tree could be computed for the instance, hence no data is available.
  • Figure 5: Average RRSE and corresponding standard deviation over 30 runs for each regression data set when comparing univariate and multivariate MILP-grown regression trees with and without SVMs in the leaf nodes against M5P, CART, RF, SVM, SRT-L, and LS-OMS.