Table of Contents
Fetching ...

RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees

Cristobal Heredia, Pedro Chumpitaz-Flores, Kaixun Hua

TL;DR

This work tackles the computational challenge of globally optimizing regression trees with continuous features by introducing RS-ORT, a reduced-space branch-and-bound algorithm that only branches on tree-structure variables. It achieves scalability through three bound-tightening techniques (leaf predictions in closed form, discretized thresholds, and exact depth-1 subtree parsing) and a two-stage, decomposable formulation with parallelizable bound computations, yielding convergence independent of the number of samples. Empirically, RS-ORT attains provably optimal training with negligible gaps on diverse benchmarks, including a 2-million-sample dataset solved within four hours, and often surpasses state-of-the-art MIP and heuristic baselines in both training accuracy and generalization. The approach thus offers a robust, interpretable, and scalable alternative for exact regression-tree learning directly on continuous features, with strong implications for high-stakes domains where transparency and optimality matter.

Abstract

Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary features or become computationally intractable when continuous, large-scale data are involved. Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees. We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT) - a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables. This design guarantees the algorithm's convergence and its independence from the number of training samples. Leveraging the model's structure, we introduce several bound tightening techniques - closed-form leaf prediction, empirical threshold discretization, and exact depth-1 subtree parsing - that combine with decomposable upper and lower bounding strategies to accelerate the training. The BB node-wise decomposition enables trivial parallel execution, further alleviating the computational intractability even for million-size datasets. Based on the empirical studies on several regression benchmarks containing both binary and continuous features, RS-ORT also delivers superior training and testing performance than state-of-the-art methods. Notably, on datasets with up to 2,000,000 samples with continuous features, RS-ORT can obtain guaranteed training performance with a simpler tree structure and a better generalization ability in four hours.

RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees

TL;DR

This work tackles the computational challenge of globally optimizing regression trees with continuous features by introducing RS-ORT, a reduced-space branch-and-bound algorithm that only branches on tree-structure variables. It achieves scalability through three bound-tightening techniques (leaf predictions in closed form, discretized thresholds, and exact depth-1 subtree parsing) and a two-stage, decomposable formulation with parallelizable bound computations, yielding convergence independent of the number of samples. Empirically, RS-ORT attains provably optimal training with negligible gaps on diverse benchmarks, including a 2-million-sample dataset solved within four hours, and often surpasses state-of-the-art MIP and heuristic baselines in both training accuracy and generalization. The approach thus offers a robust, interpretable, and scalable alternative for exact regression-tree learning directly on continuous features, with strong implications for high-stakes domains where transparency and optimality matter.

Abstract

Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary features or become computationally intractable when continuous, large-scale data are involved. Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees. We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT) - a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables. This design guarantees the algorithm's convergence and its independence from the number of training samples. Leveraging the model's structure, we introduce several bound tightening techniques - closed-form leaf prediction, empirical threshold discretization, and exact depth-1 subtree parsing - that combine with decomposable upper and lower bounding strategies to accelerate the training. The BB node-wise decomposition enables trivial parallel execution, further alleviating the computational intractability even for million-size datasets. Based on the empirical studies on several regression benchmarks containing both binary and continuous features, RS-ORT also delivers superior training and testing performance than state-of-the-art methods. Notably, on datasets with up to 2,000,000 samples with continuous features, RS-ORT can obtain guaranteed training performance with a simpler tree structure and a better generalization ability in four hours.

Paper Structure

This paper contains 25 sections, 4 theorems, 36 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $n$ be the number of samples, $P$ the number of features, and $D$ the maximum depth. Assume that each split threshold is chosen from among the distinct feature values present in the training data. Then the number of distinct tree structures is at most

Figures (1)

  • Figure 1: Train and test RMSE comparison between OSRT (binarized input) and RS-ORT (continuous input) across four datasets.

Theorems & Definitions (6)

  • Theorem 1: Upper bound on distinct tree structures
  • proof
  • Theorem 2
  • Theorem 3: Closed-form optimality of leaf predictions
  • Theorem 4: Closed-form optimality of leaf predictions
  • proof