Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

Brandon Alston; Illya V. Hicks

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

Brandon Alston, Illya V. Hicks

TL;DR

This work tackles the NP-hard problem of learning optimal multivariate decision trees by formulating two cut-based MILO models that optimize a biobjective objective: maximize correctly classified datapoints and minimize branching complexity, over a tree of depth $h$. The models employ MIS-based path feasibility cuts and shattering inequalities to generate strong cutting planes without resorting to big-$M$ constants, with connectivity constraints added on-the-fly. Compared to existing flow-based MDT formulations and MILO baselines, the proposed CUT-H variant often achieves better solution times and out-of-sample accuracy on 14 UCI datasets, while enabling imbalanced trees. The work extends univariate MDT approaches to the multivariate setting and provides public code to support reproducibility.

Abstract

Multivariate decision trees are powerful machine learning tools for classification and regression that attract many researchers and industry professionals. An optimal binary tree has two types of vertices, (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features and (ii) leaf vertices at which datapoints are given a prediction, and can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. Branching vertices are linear combinations of training features and therefore can be thought of as hyperplanes. In this paper, we propose two cut-based mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees (leaf vertices assign discrete classes). Our models leverage on-the-fly identification of minimal infeasible subsystems (MISs) from which we derive cutting planes that hold the form of packing constraints. We show theoretical improvements on the strongest flow-based MILO formulation currently in the literature and conduct experiments on publicly available datasets to show our models' ability to scale, strength against traditional branch and bound approaches, and robustness in out-of-sample test performance. Our code and data are available on GitHub.

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

TL;DR

. The models employ MIS-based path feasibility cuts and shattering inequalities to generate strong cutting planes without resorting to big-

constants, with connectivity constraints added on-the-fly. Compared to existing flow-based MDT formulations and MILO baselines, the proposed CUT-H variant often achieves better solution times and out-of-sample accuracy on 14 UCI datasets, while enabling imbalanced trees. The work extends univariate MDT approaches to the multivariate setting and provides public code to support reproducibility.

Abstract

Paper Structure (11 sections, 9 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 11 sections, 9 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Our Formulations
Cut Based Path Feasibility
Shattering Inequalities
Sub-processes
Computational Experiments
Experimental Setup
Experimental Results
Discussion
Conclusions and Future Work

Figures (5)

Figure 1: Two examples of multivariate tree compactness. In (a) and (b) you need 10 univariate vs 1 multivariate decision(s). In (c) and (d) you need 7 univariate vs 2 multivariate decisions.
Figure 2: Input decision tree $G_2=(B \cup L, E)$, branching vertex set $B=\{1,2,3\}$ and leaf vertex set $L=\{4,5,6,7\}$. Here, vertices $1$ and $2$ are assigned branching hyperplanes; vertices $3$, $4$, and $5$ are assigned to a classes 1, 2, and 3, respectively; and vertices 6 and 7 are pruned. Figure taken from alston2023.
Figure 3: Let $a, b, c$, and $v$ be nodes selected on the $1,v$-path of datapoint $i \in I$ at a fractional point in the branch and bound tree with $s^i_v$ and $q^i_u$ for $u \in P_v$ as defined. The 3 types of fractional separation cuts are indicated above/below for $\operatorname{CUT_{w}-H}$/$\operatorname{CUT-H}$, respectively. The III in parentheses is a a most violating cut considered but not added. Figure taken from alston2023.
Figure 4: MDT biobjective results for ion. Priority on objective \ref{['basemax']}.
Figure 5: MDT Pareto Frontiers and solution time distribution for ion and iris.

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

TL;DR

Abstract

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

Authors

TL;DR

Abstract

Table of Contents

Figures (5)