RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees
Cristobal Heredia, Pedro Chumpitaz-Flores, Kaixun Hua
TL;DR
This work tackles the computational challenge of globally optimizing regression trees with continuous features by introducing RS-ORT, a reduced-space branch-and-bound algorithm that only branches on tree-structure variables. It achieves scalability through three bound-tightening techniques (leaf predictions in closed form, discretized thresholds, and exact depth-1 subtree parsing) and a two-stage, decomposable formulation with parallelizable bound computations, yielding convergence independent of the number of samples. Empirically, RS-ORT attains provably optimal training with negligible gaps on diverse benchmarks, including a 2-million-sample dataset solved within four hours, and often surpasses state-of-the-art MIP and heuristic baselines in both training accuracy and generalization. The approach thus offers a robust, interpretable, and scalable alternative for exact regression-tree learning directly on continuous features, with strong implications for high-stakes domains where transparency and optimality matter.
Abstract
Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary features or become computationally intractable when continuous, large-scale data are involved. Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees. We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT) - a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables. This design guarantees the algorithm's convergence and its independence from the number of training samples. Leveraging the model's structure, we introduce several bound tightening techniques - closed-form leaf prediction, empirical threshold discretization, and exact depth-1 subtree parsing - that combine with decomposable upper and lower bounding strategies to accelerate the training. The BB node-wise decomposition enables trivial parallel execution, further alleviating the computational intractability even for million-size datasets. Based on the empirical studies on several regression benchmarks containing both binary and continuous features, RS-ORT also delivers superior training and testing performance than state-of-the-art methods. Notably, on datasets with up to 2,000,000 samples with continuous features, RS-ORT can obtain guaranteed training performance with a simpler tree structure and a better generalization ability in four hours.
