Foundational theory for optimal decision tree problems. II. Optimal hypersurface decision tree algorithm
Xi He
TL;DR
This work advances optimal decision tree methods by introducing hypersurface splits and the first hypersurface ODT (HODT) algorithm. It builds on Part I's axiomatic ODT framework to develop theoretical guarantees around crossed hyperplanes and ancestry relations, then delivers an exactOHDT procedure that embeds data via a Veronese mapping and fuses generation, filtering, and evaluation into a single recursive pass. Recognizing intractability, the authors also propose two practical heuristics, hodtCoreset and hodtWSH, which enable scalable performance on synthetic and real-world datasets. Across extensive experiments, HODT demonstrates superior accuracy and robustness to noise compared with axis-parallel baselines when model complexity is controlled, highlighting the value of richer hypersurface splits for interpretable, high-performance decision trees. The work further points to promising future directions, including handling categorical data, mixed splitting rules, and ensembles such as random forests built from hypersurface trees.
Abstract
Decision trees are a ubiquitous model for classification and regression tasks due to their interpretability and efficiency. However, solving the optimal decision tree (ODT) problem remains a challenging combinatorial optimization task. Even for the simplest splitting rules--axis-parallel hyperplanes--it is NP-hard to optimize. In Part I of this series, we rigorously defined the proper decision tree model through four axioms and, based on these, introduced four formal definitions of the ODT problem. From these definitions, we derived four generic algorithms capable of solving ODT problems for arbitrary decision trees satisfying the axioms. We also analyzed the combinatorial geometric properties of hypersurfaces, showing that decision trees defined by polynomial hypersurface splitting rules satisfy the proper axioms that we proposed. In this second paper (Part II) of this two-part series, building on the algorithmic and geometric foundations established in Part I, we introduce the first hypersurface decision tree (HODT) algorithm. To the best of our knowledge, existing optimal decision tree methods are, to date, limited to hyperplane splitting rules--a special case of hypersurfaces--and rely on general-purpose solvers. In contrast, our HODT algorithm addresses the general hypersurface decision tree model without requiring external solvers. Using synthetic datasets generated from ground-truth hyperplane decision trees, we vary tree size, data size, dimensionality, and label and feature noise. Results showing that our algorithm recovers the ground truth more accurately than axis-parallel trees and exhibits greater robustness to noise. We also analyzed generalization performance across 30 real-world datasets, showing that HODT can achieve up to 30% higher accuracy than the state-of-the-art optimal axis-parallel decision tree algorithm when tree complexity is properly controlled.
