An improved column-generation-based matheuristic for learning classification trees

Krunal Kishor Patel; Guy Desaulniers; Andrea Lodi

An improved column-generation-based matheuristic for learning classification trees

Krunal Kishor Patel, Guy Desaulniers, Andrea Lodi

TL;DR

This work addresses the scalability gap in learning accurate decision trees by refining a column-generation-based matheuristic. It introduces a faster subproblem formulation that reduces the number of SPs, leverages data-dependent constraints as cutting planes in the master problem, and adds a CP-SAT–based separation model to generate cuts for unlabeled rows, all complemented by improved preprocessing and initialization. Computational results on 12 UCI datasets show that merged SPs, beta cuts, and on-demand separating planes yield faster training and higher accuracy gains over the prior Firat 2020 column-generation approach, particularly on large datasets. The proposed framework demonstrates meaningful scalability improvements while maintaining competitive or superior accuracy, and it outlines promising directions for further enhancement and generalization to out-of-sample data.

Abstract

Decision trees are highly interpretable models for solving classification problems in machine learning (ML). The standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. Other discrete optimization models in the literature address the optimality problem but only work well on relatively small datasets. \cite{firat2020column} proposed a column-generation-based heuristic approach for learning decision trees. This approach improves scalability and can work with large datasets. In this paper, we describe improvements to this column generation approach. First, we modify the subproblem model to significantly reduce the number of subproblems in multiclass classification instances. Next, we show that the data-dependent constraints in the master problem are implied, and use them as cutting planes. Furthermore, we describe a separation model to generate data points for which the linear programming relaxation solution violates their corresponding constraints. We conclude by presenting computational results that show that these modifications result in better scalability.

An improved column-generation-based matheuristic for learning classification trees

TL;DR

Abstract

Paper Structure (13 sections, 2 theorems, 9 equations, 8 figures, 9 tables)

This paper contains 13 sections, 2 theorems, 9 equations, 8 figures, 9 tables.

Introduction
Overview of firat2020column column generation approach
Modifications of the column generation approach
Merged SPs
Redundancy of MP constraints (\ref{['beta']})
Search for violated MP constraints (\ref{['beta']})
Computational results
Datasets and experimental setup
Initialization and preprocessing results
Merged SPs results
MP constraints (\ref{['beta']}) results
Comparison with firat2020column
Conclusion and future work

Key Result

Lemma 3.1

In any solution to the model (masterold) that satisfies the constraints (alpha), (gamma), and (firatbinrho), there exists a split check $a^*_j\in S_j$ for every internal node $j\in N_{int}$ such that

Figures (8)

Figure 1: Solving time for different ways of using preprocessing and initialization.
Figure 2: Accuracy gain over CART for different ways of using preprocessing and initialization.
Figure 3: Number of columns added with original and merged SPs.
Figure 4: Accuracy gain over CART on training datasets with original and merged SPs.
Figure 5: Solving time for different ways of using constraints (\ref{['beta']}).
...and 3 more figures

Theorems & Definitions (4)

Lemma 3.1
proof
Theorem 3.2
proof

An improved column-generation-based matheuristic for learning classification trees

TL;DR

Abstract

An improved column-generation-based matheuristic for learning classification trees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)