A Simple Approximation Algorithm for Optimal Decision Tree

Zhengjia Zhuo; Viswanath Nagarajan

A Simple Approximation Algorithm for Optimal Decision Tree

Zhengjia Zhuo, Viswanath Nagarajan

TL;DR

The paper studies the Optimal Decision Tree (ODT) problem, where the goal is to identify the true hypothesis among $m$ using sequential queries with arbitrary costs and responses; ODT is NP-hard and hard to approximate beyond $\ln m$. It proposes a simple greedy policy that at each state selects the query maximizing the expected number of newly eliminated hypotheses per unit cost, and proves an approximation ratio of $8\cdot(1+\ln m)$. The analysis adapts adaptive-submodular cover techniques, introducing a Stem$(w)$ construction and a key lower bound relating the greedy progress to the optimal progress via $a_t$ and $o_{t/L}$. The result yields a practically implementable algorithm with competitive constants across general ODT settings, with implications for active learning, entity identification, and medical diagnosis tasks.

Abstract

Optimal decision tree (\odt) is a fundamental problem arising in applications such as active learning, entity identification, and medical diagnosis. An instance of \odt is given by $m$ hypotheses, out of which an unknown ``true'' hypothesis is drawn according to some probability distribution. An algorithm needs to identify the true hypothesis by making queries: each query incurs a cost and has a known response for each hypothesis. The goal is to minimize the expected query cost to identify the true hypothesis. We consider the most general setting with arbitrary costs, probabilities and responses. \odt is NP-hard to approximate better than $\ln m$ and there are $O(\ln m)$ approximation algorithms known for it. However, these algorithms and/or their analyses are quite complex. Moreover, the leading constant factors are large. We provide a simple algorithm and analysis for \odt, proving an approximation ratio of $8 \ln m$.

A Simple Approximation Algorithm for Optimal Decision Tree

TL;DR

The paper studies the Optimal Decision Tree (ODT) problem, where the goal is to identify the true hypothesis among

using sequential queries with arbitrary costs and responses; ODT is NP-hard and hard to approximate beyond

. It proposes a simple greedy policy that at each state selects the query maximizing the expected number of newly eliminated hypotheses per unit cost, and proves an approximation ratio of

. The analysis adapts adaptive-submodular cover techniques, introducing a Stem

construction and a key lower bound relating the greedy progress to the optimal progress via

and

. The result yields a practically implementable algorithm with competitive constants across general ODT settings, with implications for active learning, entity identification, and medical diagnosis tasks.

Abstract

Optimal decision tree (\odt) is a fundamental problem arising in applications such as active learning, entity identification, and medical diagnosis. An instance of \odt is given by

hypotheses, out of which an unknown ``true'' hypothesis is drawn according to some probability distribution. An algorithm needs to identify the true hypothesis by making queries: each query incurs a cost and has a known response for each hypothesis. The goal is to minimize the expected query cost to identify the true hypothesis. We consider the most general setting with arbitrary costs, probabilities and responses. \odt is NP-hard to approximate better than

and there are

approximation algorithms known for it. However, these algorithms and/or their analyses are quite complex. Moreover, the leading constant factors are large. We provide a simple algorithm and analysis for \odt, proving an approximation ratio of

A Simple Approximation Algorithm for Optimal Decision Tree

TL;DR

Abstract

A Simple Approximation Algorithm for Optimal Decision Tree

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (18)