Table of Contents
Fetching ...

Classification via Two-Way Comparisons

Marek Chrobak, Neal E. Young

TL;DR

This work addresses the problem of computing minimum-cost two-way-comparison decision trees (2WDTs) for a weighted, ordered query set $Q$ partitioned into classes, where each query is classified using equality and less-than tests and the objective is to minimize $\sum_{q\in Q} w(q)\cdot \mathrm{depth}(q)$. The authors introduce a laminar-decision-tree (LDT) framework, prove an imbalance theorem and a bound on path structure via a generalized rotation, and show that some optimal tree is admissible. They then present a dynamic-programming algorithm with running time $O(n^3 m)$ (where $n=|Q|$ and $m=\sum_{c\in\mathcal{C}}|c|$) to compute minimum-cost 2WDTs, extendable to multi-class scenarios and to other inequality tests (e.g., $\le$). The results yield the first polynomial-time algorithm for minimum-cost 2WDTs and have practical implications for efficient dispatch and classification trees, with a clear path to extensions and deterministic implementations. The work also clarifies weight-handling issues through tie-breaking perturbations and provides a robust framework for laminar tests beyond the strictly two-test setting.

Abstract

Given a weighted, ordered query set $Q$ and a partition of $Q$ into classes, we study the problem of computing a minimum-cost decision tree that, given any query $q$ in $Q$, uses equality tests and less-than comparisons to determine the class to which $q$ belongs. Such a tree can be much smaller than a lookup table, and much faster and smaller than a conventional search tree. We give the first polynomial-time algorithm for the problem. The algorithm extends naturally to the setting where each query has multiple allowed classes.

Classification via Two-Way Comparisons

TL;DR

This work addresses the problem of computing minimum-cost two-way-comparison decision trees (2WDTs) for a weighted, ordered query set partitioned into classes, where each query is classified using equality and less-than tests and the objective is to minimize . The authors introduce a laminar-decision-tree (LDT) framework, prove an imbalance theorem and a bound on path structure via a generalized rotation, and show that some optimal tree is admissible. They then present a dynamic-programming algorithm with running time (where and ) to compute minimum-cost 2WDTs, extendable to multi-class scenarios and to other inequality tests (e.g., ). The results yield the first polynomial-time algorithm for minimum-cost 2WDTs and have practical implications for efficient dispatch and classification trees, with a clear path to extensions and deterministic implementations. The work also clarifies weight-handling issues through tie-breaking perturbations and provides a robust framework for laminar tests beyond the strictly two-test setting.

Abstract

Given a weighted, ordered query set and a partition of into classes, we study the problem of computing a minimum-cost decision tree that, given any query in , uses equality tests and less-than comparisons to determine the class to which belongs. Such a tree can be much smaller than a lookup table, and much faster and smaller than a conventional search tree. We give the first polynomial-time algorithm for the problem. The algorithm extends naturally to the setting where each query has multiple allowed classes.
Paper Structure (15 sections, 8 theorems, 6 equations, 7 figures)

This paper contains 15 sections, 8 theorems, 6 equations, 7 figures.

Key Result

theorem 1

Let $T$ be any optimal, irreducible tree for an LDT instance $I=(Q, w, \mathcal{C}, \mathcal{F})$. Let $u_{1}\to u_{2}\to \cdots \to u_{d}$ be the downward path from any node $u_{1}$ to any proper descendant $u_{d}$ in $T$ such that ${w({u_{2}'})} < {w({u_{d}})}$. Then the outcomes leaving $u_{1}\to

Figures (7)

  • Figure 1: An optimal two-way-comparison decision tree (2WDT) for the problem instance shown on the right. The instance (but not the tree) is from chamber_chen_dispatching_1999Chambers:1999:EMP:320385.320407. Each internal node represents a comparison between the given query and the node's key $k$: either an equality test, represented as "$=\!k$", or a less-than test, represented as "$<\!k$". Each leaf (rectangle) is labeled with the queries that reach it, and below that with the class for the leaf. The table gives the class and weight of each query $q\in Q= [50] = \{1, 2,\ldots, 50\}$. The tree has cost 2055, about 11% cheaper than the tree from chamber_chen_dispatching_1999Chambers:1999:EMP:320385.320407, of cost 2305.
  • Figure 2: Tree (a) is a three-way-comparison search tree (3WST). Tree (b) is a two-way-comparison search tree (2WST) for the same instance. The query (or interval of queries) reaching each (rectangular) leaf is within the leaf. The weight of the query (or interval) is below the leaf.
  • Figure 3: Three trees for the 2WDT instance shown in (d). The set of queries reaching each (rectangular) leaf is shown within the leaf (to save space, there $\iota_i$ denotes the inter-key open interval with right boundary $i$, e.g. $\iota_1 = (-\infty, 1)$, $\iota_2 = (1, 2)$). The associated weights are below the leaf. The optimal tree (a) has cost 36 and is not heaviest-first. Each heaviest-first tree (e.g. (b) of cost 37 or (c) of cost 39) is not optimal. These properties also hold if each weight is perturbed to make the weights distinct. (Note: in our formal model, the inter-key intervals will be represented by virtual non-key queries.)
  • Figure 4: Rotating a non-root test node $b$ in $T$ moves $b$ (along with its preferred child $c'$ and the subtree rooted at $c'$) above its parent $a$. Unlike binary search trees, laminar search trees are not inherently ordered. When drawing a rotation in a laminar tree, we draw the first tree $T_a$ using any convenient order, then, when drawing the rotated tree $T'_b$, order each node's two outcomes the same as they were ordered in $T_a$. Above, (i) and (ii) are two ways of drawing the exact same rotation. Throughout, $u'$ denotes the sibling of a given node $u$ in the original tree $T$, in which $T_a$ is a subtree.
  • Figure 5: The sequence of rotations in the proof of Theorem \ref{['thm: imbalance']}. The drawing orders the initial tree $T^{q-1}=T$ so the path $u_{p}\to\cdots\to u_{q}$ lies on the left spine. The case $(p, q) = (2, 5)$ is shown in (a). For the general case, (b) shows the first and last two trees in the sequence. In each rotation except the last, the preferred outcome of $u_{q}$ is $u_{q}\to u_{q+1}'$. The preferred outcome is drawn to the right, so the rotation is of the form shown in Figure \ref{['fig: rotation']}(i). It moves $u_{q}$ (and the preferred outcome $u_{q}\to u_{q+1}'$) above $u_{i}$. Finally, in the last rotation, the preferred outcome of $u_{q}$ is $u_{q}\to u_{q+1}$. The preferred outcome is drawn to the left, so the rotation is of the form shown in Figure \ref{['fig: rotation']}(ii). This rotation moves the root $u_{p}$ down and out of the path.
  • ...and 2 more figures

Theorems & Definitions (23)

  • theorem 1
  • proof
  • definition 1
  • definition 2
  • proof
  • proof : Proof of Theorem \ref{['thm: imbalance']}
  • theorem 2
  • proof
  • lemma 1
  • proof
  • ...and 13 more