Foundational theory for optimal decision tree problems. I. Algorithmic and geometric foundations
Xi He
TL;DR
This work delivers a rigorous, executable framework for optimal decision tree problems by introducing four formal, axiomatic definitions spanning size- and depth-constrained settings. It builds a bridge from brute-force specifications to dynamic programming and related efficient strategies where monotonicity allows, while also detailing when DP is not feasible. The paper additionally extends the theory to binary-feature data and lays out a geometric foundation for hypersurface-based splits via Veronese embeddings, establishing that hypersurface DTs are tractable under the proposed axioms. A key contribution is the disciplined treatment of problem specification, search spaces, and acceleration mechanisms (filtering and thinning), which collectively enable provable optimality and scalable computation. Part II will present the first optimal hypersurface decision tree algorithms and comprehensive experiments comparing axis-parallel, oblique, and hypersurface splits.
Abstract
In the first paper (part I) of this series of two, we introduce four novel definitions of the ODT problems: three for size-constrained trees and one for depth-constrained trees. These definitions are stated unambiguously through executable recursive programs, satisfying all criteria we propose for a formal specification. In this sense, they resemble the "standard form" used in the study of general-purpose solvers. Grounded in algebraic programming theory-a relational formalism for deriving correct-by-construction algorithms from specifications-we can not only establish the existence or nonexistence of dynamic programming solutions but also derive them constructively whenever they exist. Consequently, the four generic problem definitions yield four novel optimal algorithms for ODT problems with arbitrary splitting rules that satisfy the axioms and objective functions of a given form. These algorithms encompass the known depth-constrained, axis-parallel ODT algorithm as the special case, while providing a unified, efficient, and elegant solution for the general ODT problem. In Part II, we present the first optimal hypersurface decision tree algorithm and provide comprehensive experiments against axis-parallel decision tree algorithms, including heuristic CART and state-of-the-art optimal methods. The results demonstrate the significant potential of decision trees with flexible splitting rules. Moreover, our framework is readily extendable to support algorithms for constructing even more flexible decision trees, including those with mixed splitting rules.
