Acceleration Techniques for Learning Optimal Classification Trees with Integer Programming

Mitchell Keegan; Michael Forbes; Paul Corry; Mahdi Abolghasemi

Acceleration Techniques for Learning Optimal Classification Trees with Integer Programming

Mitchell Keegan, Michael Forbes, Paul Corry, Mahdi Abolghasemi

TL;DR

This work tackles learning globally optimal classification trees (OCTs) by accelerating the BendOCT mixed-integer programming formulation. It derives BendOCT via logic-based Benders decomposition (LBBD) and introduces a suite of enhancements: strengthened Benders cuts, a solution-polishing primal heuristic, equivalent-point inequalities (EQP), and path-bound cutting planes that leverage depth-2 subtree optimals. Empirical results across 33 datasets show dramatic scalability gains, solving many more instances to optimality within a 1-hour limit (e.g., 1173/1620 vs 582/1620 for the baseline), highlighting the value of DP-inspired bounds within a MIP framework. The approach broadens the practical applicability of optimal decision trees by combining flexibility with significantly improved convergence, with potential extensions to other objectives and deeper trees.

Abstract

Decision trees are a popular machine learning model which are traditionally trained by heuristic methods. Massive improvements in computing power and optimisation techniques has led to renewed interest in learning globally optimal decision trees. Empirical evidence shows that optimal classification trees (OCTs) have better out-of-sample performance than heuristic methods. The dominant optimisation paradigms for training OCTs are mixed-integer programming (MIP) and dynamic programming (DP). MIP formulations offer flexibility in the objectives and constraints that are modelled, but suffer from poor scaling in the size of the training dataset and the maximum tree depth. DP models represent the state of the art in scaling for OCTs, but lack some of the flexibility of MIP models. In this paper we present progress on using advanced integer programming methods to integrate ideas from DP models into MIP formulations to begin bridging the scaling gap. Using the existing BendOCT model from the literature as a base model, we introduce valid inequalities, cutting planes, and a primal heuristic to improve the scaling of MIP formulations. We show that these techniques significantly improve the ability of BendOCT to find provably optimal solutions over a wide range of datasets.

Acceleration Techniques for Learning Optimal Classification Trees with Integer Programming

TL;DR

Abstract

Acceleration Techniques for Learning Optimal Classification Trees with Integer Programming

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)