Table of Contents
Fetching ...

Proper decision trees: An axiomatic framework for solving optimal decision tree problems with arbitrary splitting rules

Xi He, Max A. Little

TL;DR

The paper develops an axiomatic framework that classifies decision-tree problems into a proper, structurally constrained class and non-proper variants, enabling a unified dynamic-programming approach. By proving that proper trees correspond uniquely to $K$-permutations via level-order traversal, it derives a generic DP recursion with complexity $O(K! imes M^K)$ for fixed $K$ and analyzes the impracticality of memoization in this setting. It then constructs complete and efficient generators for the search space using downward accumulation and fusion, and introduces acceleration techniques like prefix-closed filtering and thinning. The framework is demonstrated across BSP, axis-parallel/hyperplane/hypersurface splits, and K-D trees, while extensions to non-proper problems (e.g., ODT-BF, Murtree, MCMP) show the versatility of the approach. Overall, the work provides formal definitions, algorithmic tools, and insights for exact optimal decision-tree construction with arbitrary splitting rules, with implications for theory and practice in combinatorial optimization and related data-structure problems.

Abstract

We present an axiomatic framework for analyzing the algorithmic properties of decision trees. This framework supports the classification of decision tree problems through structural and ancestral constraints within a rigorous mathematical foundation. The central focus of this paper is a special class of decision tree problems-which we term proper decision trees-due to their versatility and effectiveness. In terms of versatility, this class subsumes several well-known data structures, including binary space partitioning trees, K-D trees, and machine learning decision tree models. Regarding effectiveness, we prove that only proper decision trees can be uniquely characterized as K-permutations, whereas typical non-proper decision trees correspond to binary-labeled decision trees with substantially greater complexity. Using this formal characterization, we develop a generic algorithmic approach for solving optimal decision tree problems over arbitrary splitting rules and objective functions for proper decision trees. We constructively derive a generic dynamic programming recursion for solving these problems exactly. However, we show that memoization is generally impractical in terms of space complexity, as both datasets and subtrees must be stored. This result contradicts claims in the literature that suggest a trade-off between memoizing datasets and subtrees. Our framework further accommodates constraints such as tree depth and leaf size, and can be accelerated using techniques such as thinning. Finally, we extend our analysis to several non-proper decision trees, including the commonly studied decision tree over binary feature data, the binary search tree, and the tree structure arising in the matrix chain multiplication problem. We demonstrate how these problems can be solved by appropriately modifying or discarding certain axioms.

Proper decision trees: An axiomatic framework for solving optimal decision tree problems with arbitrary splitting rules

TL;DR

The paper develops an axiomatic framework that classifies decision-tree problems into a proper, structurally constrained class and non-proper variants, enabling a unified dynamic-programming approach. By proving that proper trees correspond uniquely to -permutations via level-order traversal, it derives a generic DP recursion with complexity for fixed and analyzes the impracticality of memoization in this setting. It then constructs complete and efficient generators for the search space using downward accumulation and fusion, and introduces acceleration techniques like prefix-closed filtering and thinning. The framework is demonstrated across BSP, axis-parallel/hyperplane/hypersurface splits, and K-D trees, while extensions to non-proper problems (e.g., ODT-BF, Murtree, MCMP) show the versatility of the approach. Overall, the work provides formal definitions, algorithmic tools, and insights for exact optimal decision-tree construction with arbitrary splitting rules, with implications for theory and practice in combinatorial optimization and related data-structure problems.

Abstract

We present an axiomatic framework for analyzing the algorithmic properties of decision trees. This framework supports the classification of decision tree problems through structural and ancestral constraints within a rigorous mathematical foundation. The central focus of this paper is a special class of decision tree problems-which we term proper decision trees-due to their versatility and effectiveness. In terms of versatility, this class subsumes several well-known data structures, including binary space partitioning trees, K-D trees, and machine learning decision tree models. Regarding effectiveness, we prove that only proper decision trees can be uniquely characterized as K-permutations, whereas typical non-proper decision trees correspond to binary-labeled decision trees with substantially greater complexity. Using this formal characterization, we develop a generic algorithmic approach for solving optimal decision tree problems over arbitrary splitting rules and objective functions for proper decision trees. We constructively derive a generic dynamic programming recursion for solving these problems exactly. However, we show that memoization is generally impractical in terms of space complexity, as both datasets and subtrees must be stored. This result contradicts claims in the literature that suggest a trade-off between memoizing datasets and subtrees. Our framework further accommodates constraints such as tree depth and leaf size, and can be accelerated using techniques such as thinning. Finally, we extend our analysis to several non-proper decision trees, including the commonly studied decision tree over binary feature data, the binary search tree, and the tree structure arising in the matrix chain multiplication problem. We demonstrate how these problems can be solved by appropriately modifying or discarding certain axioms.

Paper Structure

This paper contains 48 sections, 12 theorems, 70 equations, 11 figures.

Key Result

theorem 1

Simplified optimal decision tree problem. Assume a list of rules $\mathit{rs}:\left[\mathcal{R}\right]$ and a size constraint $K:\mathbb{N}$. Let the search space $\mathcal{S}\left(K,\mathit{rs}\right)$of size-$K$ decision trees be defined by the program $\mathit{genDTKs}\left(K,\mathit{rs}\right)$. where the symbol "$\subseteq$" indicates that the solution on the left-hand side is also a solution

Figures (11)

  • Figure 1: Panels (a–c) describe three types of proper decision tree problems: (a) Axis-parallel decision tree model in machine learning: This model uses axis-parallel splits to divide data into regions. For example, panel (a) shows four splits creating five regions (leaves), with predictions based on the majority class or average value in each region. This can be extended to more complex splits like hyperplanes or hypersurfaces. (b) Binary space partition tree : A segment-based decision tree that divides space into unique cells, each corresponding to a leaf in the tree, aiming for a minimal structure. (c) $K$-D tree: Similar to the axis-parallel tree, but nodes at the same level split along the same dimension. For instance, nodes $B$ and $C$ split along the $x_{2}$-axis, while nodes $D$ and $E$ split along the $x_{1}$-axis. Panels (d–f) illustrate non-proper decision tree problems: (d) Matrix chain multiplication problem: This seeks the optimal order for multiplying matrices to minimize computational cost. Panel (d) shows five ways to multiply four matrices, with the tree's structure defining the order. (e) Optimal decision tree problem over binary feature data : Unlike axis-parallel trees, splits are based on binary questions (e.g., "Is feature $A$ present?"). Paths go left for "yes" and right for "no." In panel (e), figure (ii) shows a tree that classifies the four data points with three features shown in (i). (f) Binary search tree (BST): Given three nodes $r_1 \leq r_2 \leq r_3$, this panel illustrates five possible BSTs for these nodes. Each BST classifies nodes smaller than the root to the left and nodes greater than the root to the right.
  • Figure 2: The ancestryrelationgraph (left) captures all ancestry relations between four splitting rules $\left[r_{0},r_{1},r_{2},r_{3}\right]$. In this graph, nodes represent rules, and arrows represent ancestral relations. An incoming arrow from $r_{i}$ to a node $r_{j}$ indicates that $r_{j}$ is the right-child of $r_{i}$ (read the arrow next to $r_{i}$) . The absence of an arrow indicates no ancestral relation. An outgoing arrows from $r_{i}$ to a node $r_{j}$ indicates that $r_{j}$ is the left-child of $r_{i}$. The ancestral relation matrix (right) $\boldsymbol{K}$, where the elements $\boldsymbol{K}_{ij}=1$, $\boldsymbol{K}_{ij}=-1$, and $\boldsymbol{K}_{ij}=0$ indicate that $r_{j}$ lies on the positive side, negative side of $r_{i}$, or that there is no ancestry relation between them, respectively.
  • Figure 3: An axis-parallel decision tree model (left), a hyperplanes (oblique) decision tree model (middle), and a hypersurface (defined by degree-$2$ polynomials) decision tree model (right), characterized by one, two, and five points, respectively. As the complexity of the splitting functions increase, the tree's complexity decreases (involving fewer splitting nodes).
  • Figure 4: A decision tree with three splitting rules, corresponds to 3-permutation $\left[r_{1},r_{2},r_{3}\right]$.
  • Figure 5: Four hyperplanes in $\mathbb{R}^{2}$. The black circles represent data points used to define these hyperplanes, and the black arrows indicate the direction of the hyperplanes.
  • ...and 6 more figures

Theorems & Definitions (36)

  • theorem 1
  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • theorem 2
  • proof
  • corollary 1
  • proof
  • lemma 1
  • ...and 26 more