Table of Contents
Fetching ...

Foundational theory for optimal decision tree problems. I. Algorithmic and geometric foundations

Xi He

TL;DR

This work delivers a rigorous, executable framework for optimal decision tree problems by introducing four formal, axiomatic definitions spanning size- and depth-constrained settings. It builds a bridge from brute-force specifications to dynamic programming and related efficient strategies where monotonicity allows, while also detailing when DP is not feasible. The paper additionally extends the theory to binary-feature data and lays out a geometric foundation for hypersurface-based splits via Veronese embeddings, establishing that hypersurface DTs are tractable under the proposed axioms. A key contribution is the disciplined treatment of problem specification, search spaces, and acceleration mechanisms (filtering and thinning), which collectively enable provable optimality and scalable computation. Part II will present the first optimal hypersurface decision tree algorithms and comprehensive experiments comparing axis-parallel, oblique, and hypersurface splits.

Abstract

In the first paper (part I) of this series of two, we introduce four novel definitions of the ODT problems: three for size-constrained trees and one for depth-constrained trees. These definitions are stated unambiguously through executable recursive programs, satisfying all criteria we propose for a formal specification. In this sense, they resemble the "standard form" used in the study of general-purpose solvers. Grounded in algebraic programming theory-a relational formalism for deriving correct-by-construction algorithms from specifications-we can not only establish the existence or nonexistence of dynamic programming solutions but also derive them constructively whenever they exist. Consequently, the four generic problem definitions yield four novel optimal algorithms for ODT problems with arbitrary splitting rules that satisfy the axioms and objective functions of a given form. These algorithms encompass the known depth-constrained, axis-parallel ODT algorithm as the special case, while providing a unified, efficient, and elegant solution for the general ODT problem. In Part II, we present the first optimal hypersurface decision tree algorithm and provide comprehensive experiments against axis-parallel decision tree algorithms, including heuristic CART and state-of-the-art optimal methods. The results demonstrate the significant potential of decision trees with flexible splitting rules. Moreover, our framework is readily extendable to support algorithms for constructing even more flexible decision trees, including those with mixed splitting rules.

Foundational theory for optimal decision tree problems. I. Algorithmic and geometric foundations

TL;DR

This work delivers a rigorous, executable framework for optimal decision tree problems by introducing four formal, axiomatic definitions spanning size- and depth-constrained settings. It builds a bridge from brute-force specifications to dynamic programming and related efficient strategies where monotonicity allows, while also detailing when DP is not feasible. The paper additionally extends the theory to binary-feature data and lays out a geometric foundation for hypersurface-based splits via Veronese embeddings, establishing that hypersurface DTs are tractable under the proposed axioms. A key contribution is the disciplined treatment of problem specification, search spaces, and acceleration mechanisms (filtering and thinning), which collectively enable provable optimality and scalable computation. Part II will present the first optimal hypersurface decision tree algorithms and comprehensive experiments comparing axis-parallel, oblique, and hypersurface splits.

Abstract

In the first paper (part I) of this series of two, we introduce four novel definitions of the ODT problems: three for size-constrained trees and one for depth-constrained trees. These definitions are stated unambiguously through executable recursive programs, satisfying all criteria we propose for a formal specification. In this sense, they resemble the "standard form" used in the study of general-purpose solvers. Grounded in algebraic programming theory-a relational formalism for deriving correct-by-construction algorithms from specifications-we can not only establish the existence or nonexistence of dynamic programming solutions but also derive them constructively whenever they exist. Consequently, the four generic problem definitions yield four novel optimal algorithms for ODT problems with arbitrary splitting rules that satisfy the axioms and objective functions of a given form. These algorithms encompass the known depth-constrained, axis-parallel ODT algorithm as the special case, while providing a unified, efficient, and elegant solution for the general ODT problem. In Part II, we present the first optimal hypersurface decision tree algorithm and provide comprehensive experiments against axis-parallel decision tree algorithms, including heuristic CART and state-of-the-art optimal methods. The results demonstrate the significant potential of decision trees with flexible splitting rules. Moreover, our framework is readily extendable to support algorithms for constructing even more flexible decision trees, including those with mixed splitting rules.

Paper Structure

This paper contains 62 sections, 18 theorems, 103 equations, 1 figure, 3 tables, 2 algorithms.

Key Result

Theorem 3

A decision tree consisting of $K$ splitting rules corresponds to a unique $K$-permutation if and only if it is proper. In other words, there exists an injective mapping from proper decision trees to valid $K$-permutation (i.e., $K$-permutations that satisfies the proper decision tree axiom).

Figures (1)

  • Figure 1: The ancestryrelationgraph (left) captures all ancestry relations between four splitting rules $\left[r_{0},r_{1},r_{2},r_{3}\right]$. In this graph, nodes represent rules, and arrows represent ancestral relations. An incoming arrow from $r_{j}$ to a node $r_{i}$ indicates that $r_{j}$ is the right-child of $r_{i}$. The absence of an arrow indicates no ancestral relation. An outgoing arrows from $r_{i}$ to a node $r_{j}$ indicates that $r_{j}$ is the left-child of $r_{i}$. The ancestral relation matrix (right) $\boldsymbol{K}$, where the elements $\boldsymbol{K}_{ij}=1$, $\boldsymbol{K}_{ij}=-1$, and $\boldsymbol{K}_{ij}=0$ indicate that $r_{j}$ lies on the positive side, negative side of $r_{i}$, or that there is no ancestry relation between them, respectively.

Theorems & Definitions (27)

  • Definition 1
  • Definition 2
  • Theorem 3
  • Lemma 4
  • Theorem 5
  • Lemma 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • Definition 10
  • ...and 17 more