Table of Contents
Fetching ...

A Mathematical Programming Approach to Optimal Classification Forests

Víctor Blanco, Alberto Japón, Justo Puerto, Peter Zhang

TL;DR

The paper tackles the challenge of achieving high predictive accuracy while maintaining interpretability in classification. It introduces Weighted Optimal Classification Forests (WOCF), an approach that trains a forest of $R$ decision trees via a mixed-integer linear program and aggregates predictions through voting. Key contributions include the MILP formulation, symmetry-breaking to curb permutation redundancy, and empirical results showing Pareto improvements over traditional tree-based methods on small to medium datasets, complemented by three real-case studies that demonstrate interpretability and counterfactual reasoning. The method offers a practical, interpretable alternative to CART, OCT, RF, and XGBoost for high-stakes problems with moderate data sizes, while identifying avenues for scalability and formulation refinement.

Abstract

This paper introduces Weighted Optimal Classification Forests (WOCFs), a new family of classifiers that takes advantage of an optimal ensemble of decision trees to derive accurate and interpretable classifiers. We propose a novel mathematical optimization-based methodology which simultaneously constructs a given number of trees, each of them providing a predicted class for the observations in the feature space. The classification rule is derived by assigning to each observation its most frequently predicted class among the trees. We provide a mixed integer linear programming formulation (MIP) for the problem and several novel MIP strengthening / scaling techniques. We report the results of our computational experiments, from which we conclude that our method has equal or superior performance compared with state-of-the-art tree-based classification methods for small to medium-sized instances. We also present three real-world case studies showing that our methodology has very interesting implications in terms of interpretability. Overall, WOCFs complement existing methods such as CART, Optimal Classification Trees, Random Forests and XGBoost. In addition to its Pareto improvement on accuracy and interpretability, we also see unique properties emerging in terms of different trees focusing on different feature variables. This provides nontrivial improvement in interpretability and usability of the trained model in terms of counterfactual explanation. Thus, despite the apparent computational challenge of WOCFs that limit the size of the problems that can be efficiently solved with current MIP, this is an important research direction that can lead to qualitatively different insights for researchers and complement the toolbox of practitioners for high stakes problems.

A Mathematical Programming Approach to Optimal Classification Forests

TL;DR

The paper tackles the challenge of achieving high predictive accuracy while maintaining interpretability in classification. It introduces Weighted Optimal Classification Forests (WOCF), an approach that trains a forest of decision trees via a mixed-integer linear program and aggregates predictions through voting. Key contributions include the MILP formulation, symmetry-breaking to curb permutation redundancy, and empirical results showing Pareto improvements over traditional tree-based methods on small to medium datasets, complemented by three real-case studies that demonstrate interpretability and counterfactual reasoning. The method offers a practical, interpretable alternative to CART, OCT, RF, and XGBoost for high-stakes problems with moderate data sizes, while identifying avenues for scalability and formulation refinement.

Abstract

This paper introduces Weighted Optimal Classification Forests (WOCFs), a new family of classifiers that takes advantage of an optimal ensemble of decision trees to derive accurate and interpretable classifiers. We propose a novel mathematical optimization-based methodology which simultaneously constructs a given number of trees, each of them providing a predicted class for the observations in the feature space. The classification rule is derived by assigning to each observation its most frequently predicted class among the trees. We provide a mixed integer linear programming formulation (MIP) for the problem and several novel MIP strengthening / scaling techniques. We report the results of our computational experiments, from which we conclude that our method has equal or superior performance compared with state-of-the-art tree-based classification methods for small to medium-sized instances. We also present three real-world case studies showing that our methodology has very interesting implications in terms of interpretability. Overall, WOCFs complement existing methods such as CART, Optimal Classification Trees, Random Forests and XGBoost. In addition to its Pareto improvement on accuracy and interpretability, we also see unique properties emerging in terms of different trees focusing on different feature variables. This provides nontrivial improvement in interpretability and usability of the trained model in terms of counterfactual explanation. Thus, despite the apparent computational challenge of WOCFs that limit the size of the problems that can be efficiently solved with current MIP, this is an important research direction that can lead to qualitatively different insights for researchers and complement the toolbox of practitioners for high stakes problems.
Paper Structure (12 sections, 10 equations, 9 figures, 6 tables)

This paper contains 12 sections, 10 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Binary classification problem with its split solution in $\left[0,1\right]\times [0,1]$ (left) and its OCT graph-solution form (right)
  • Figure 2: Optimal classification forest solution.
  • Figure 3: Misclassifications occur in individual trees.
  • Figure 4: Solution OCT ($\sim 78\%$ of acccuracy) for the Mortgage case study.
  • Figure 5: Solution 3-OCF ($\sim 82\%$ of accuracy) for the Mortgage case study.
  • ...and 4 more figures