Superconstant Inapproximability of Decision Tree Learning

Caleb Koch; Carmen Strassle; Li-Yang Tan

Superconstant Inapproximability of Decision Tree Learning

Caleb Koch, Carmen Strassle, Li-Yang Tan

TL;DR

This paper proves that the task of properly PAC-learning decision trees with queries remains NP-hard even if the hypothesis size is allowed to be a constant factor larger than the target, i.e., $s' \le C \cdot s$ for any fixed $C>1$. The authors provide a two-step reduction from Vertex Cover: first establishing slight inapproximability and then amplifying it via a new XOR lemma for decision trees to achieve superconstant inapproximability. They also supply a simpler proof of the prior KST23 result and discuss implications for Decision Tree Minimization, strengthening known hardness bounds. These results imply that efficiently recovering near-optimal decision-tree representations from query access is unlikely under standard complexity assumptions, with meaningful consequences for interpretability workflows in decision-tree-based models.

Abstract

We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree $T$ is required to be optimally small, is NP-hard. Their work leaves open the question of whether the task remains intractable if $T$ is only required to be close to optimal, say within a factor of 2, rather than exactly optimal. We answer this affirmatively and show that the task indeed remains NP-hard even if $T$ is allowed to be within any constant factor of optimal. More generally, our result allows for a smooth tradeoff between the hardness assumption and the inapproximability factor. As Koch et al.'s techniques do not appear to be amenable to such a strengthening, we first recover their result with a new and simpler proof, which we couple with a new XOR lemma for decision trees. While there is a large body of work on XOR lemmas for decision trees, our setting necessitates parameters that are extremely sharp, and are not known to be attainable by existing XOR lemmas. Our work also carries new implications for the related problem of Decision Tree Minimization.

Superconstant Inapproximability of Decision Tree Learning

TL;DR

This paper proves that the task of properly PAC-learning decision trees with queries remains NP-hard even if the hypothesis size is allowed to be a constant factor larger than the target, i.e.,

for any fixed

. The authors provide a two-step reduction from Vertex Cover: first establishing slight inapproximability and then amplifying it via a new XOR lemma for decision trees to achieve superconstant inapproximability. They also supply a simpler proof of the prior KST23 result and discuss implications for Decision Tree Minimization, strengthening known hardness bounds. These results imply that efficiently recovering near-optimal decision-tree representations from query access is unlikely under standard complexity assumptions, with meaningful consequences for interpretability workflows in decision-tree-based models.

Abstract

We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree

is required to be optimally small, is NP-hard. Their work leaves open the question of whether the task remains intractable if

is only required to be close to optimal, say within a factor of 2, rather than exactly optimal. We answer this affirmatively and show that the task indeed remains NP-hard even if

is allowed to be within any constant factor of optimal. More generally, our result allows for a smooth tradeoff between the hardness assumption and the inapproximability factor. As Koch et al.'s techniques do not appear to be amenable to such a strengthening, we first recover their result with a new and simpler proof, which we couple with a new XOR lemma for decision trees. While there is a large body of work on XOR lemmas for decision trees, our setting necessitates parameters that are extremely sharp, and are not known to be attainable by existing XOR lemmas. Our work also carries new implications for the related problem of Decision Tree Minimization.

Paper Structure (44 sections, 19 theorems, 32 equations, 5 figures)

This paper contains 44 sections, 19 theorems, 32 equations, 5 figures.

Introduction
KST23.
This work
Background and Context
Algorithms for properly learning decision trees
Strictly-proper learning via dynamic programming.
Weakly-proper learning via Ehrenfeucht--Haussler.
Lower bounds for random example learners
Other related work: improper learning of decision trees
Technical Overview
Step 1: Slight inapproximability
Key ingredients in the proof of \ref{['claim:slight-inapprox']}: Patch up and hard distribution lemmas.
Step 2: Gap amplification
Preliminaries
Notation and naming conventions.
...and 29 more sections

Key Result

Theorem 1

For every constant $C>1$, there is a constant $\varepsilon>0$ such that the following holds. If there is an algorithm running in time $t(n)$ that, given queries to an $n$-variable function $f$ computable by a decision tree of size $s = O(n)$ and random examples $(\boldsymbol{x},f(\boldsymbol{x}))$ d

Figures (5)

Figure 1: Summary of lower bounds for decision tree learning.
Figure 2: An illustration of main reduction from Vertex Cover in two steps. The first step, which establishes slight inapproximability of decision tree learning, is proved in \ref{['claim:slight-inapprox']}. The second step amplifies this slight inapproximability gap using \ref{['claim:strong-inapproximability']}.
Figure 3: Using an algorithm for DT-Learn to solve Vertex Cover.
Figure 4: Illustration of a stacked decision tree for a function $f^{(1)}\oplus\cdots\oplus f^{(r)}$. For an input $x=(x^{(1)},\ldots,x^{(r)})$, the decision tree sequentially computes $f^{(i)}(x^{(i)})$ for each $i=1,\ldots,r$ using a decision tree $T^{(i)}$ of size $\mathrm{DT}(f^{(i)})$ for $f^{(i)}$. Then at the leaf it outputs $f^{(1)}(x^{(1)})\oplus\cdots\oplus f^{(r)}(x^{(r)})$. The overall size of the decision tree is $\prod_{i=1}^r\mathrm{DT}(f^{(i)})$.
Figure 5: Using an algorithm for DT-Learn on $\ell\text{-}{\mathrm{\mathrm{\sc IsEdge}}}^{\oplus r}$ to solve Vertex Cover.

Theorems & Definitions (41)

Theorem 1
Theorem 2
Definition 3.1: $\mathrm{\mathrm{\sc IsEdge}}_G$
Claim 3.2
Lemma 3.3: Patch up lemma
Lemma 3.4: Hard distribution lemma
Lemma 3.5: XOR-ed version of Patch Up Lemma, see \ref{['lem:patchup-xor']} for the exact version
Lemma 3.6: XOR-ed version of Hard Distribution Lemma, see \ref{['lem:hard-distribution-lemma-xor']} for the exact version
Claim 3.7
Theorem 3: Hardness of approximating Vertex Cover
...and 31 more

Superconstant Inapproximability of Decision Tree Learning

TL;DR

Abstract

Superconstant Inapproximability of Decision Tree Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (41)