Table of Contents
Fetching ...

Fast decision tree learning solves hard coding-theoretic problems

Caleb Koch, Carmen Strassle, Li-Yang Tan

TL;DR

It is shown that any improvement of Ehrenfeucht and Haussler's algorithm will yield $O$(logn)-approximation algorithms for k-NCP, an exponential improvement of the current state of the art.

Abstract

We connect the problem of properly PAC learning decision trees to the parameterized Nearest Codeword Problem ($k$-NCP). Despite significant effort by the respective communities, algorithmic progress on both problems has been stuck: the fastest known algorithm for the former runs in quasipolynomial time (Ehrenfeucht and Haussler 1989) and the best known approximation ratio for the latter is $O(n/\log n)$ (Berman and Karpinsky 2002; Alon, Panigrahy, and Yekhanin 2009). Research on both problems has thus far proceeded independently with no known connections. We show that $\textit{any}$ improvement of Ehrenfeucht and Haussler's algorithm will yield $O(\log n)$-approximation algorithms for $k$-NCP, an exponential improvement of the current state of the art. This can be interpreted either as a new avenue for designing algorithms for $k$-NCP, or as one for establishing the optimality of Ehrenfeucht and Haussler's algorithm. Furthermore, our reduction along with existing inapproximability results for $k$-NCP already rule out polynomial-time algorithms for properly learning decision trees. A notable aspect of our hardness results is that they hold even in the setting of $\textit{weak}$ learning whereas prior ones were limited to the setting of strong learning.

Fast decision tree learning solves hard coding-theoretic problems

TL;DR

It is shown that any improvement of Ehrenfeucht and Haussler's algorithm will yield (logn)-approximation algorithms for k-NCP, an exponential improvement of the current state of the art.

Abstract

We connect the problem of properly PAC learning decision trees to the parameterized Nearest Codeword Problem (-NCP). Despite significant effort by the respective communities, algorithmic progress on both problems has been stuck: the fastest known algorithm for the former runs in quasipolynomial time (Ehrenfeucht and Haussler 1989) and the best known approximation ratio for the latter is (Berman and Karpinsky 2002; Alon, Panigrahy, and Yekhanin 2009). Research on both problems has thus far proceeded independently with no known connections. We show that improvement of Ehrenfeucht and Haussler's algorithm will yield -approximation algorithms for -NCP, an exponential improvement of the current state of the art. This can be interpreted either as a new avenue for designing algorithms for -NCP, or as one for establishing the optimality of Ehrenfeucht and Haussler's algorithm. Furthermore, our reduction along with existing inapproximability results for -NCP already rule out polynomial-time algorithms for properly learning decision trees. A notable aspect of our hardness results is that they hold even in the setting of learning whereas prior ones were limited to the setting of strong learning.
Paper Structure (62 sections, 26 theorems, 32 equations, 3 figures, 1 table)

This paper contains 62 sections, 26 theorems, 32 equations, 3 figures, 1 table.

Key Result

Theorem 1

There is an algorithm that, given random examples $(\boldsymbol{x},f(\boldsymbol{x}))$ where $f : \{0,1\}^n \to \{0,1\}$ is a size-$s$ decision tree and $\boldsymbol{x}$ is drawn from a distribution $\mathcal{D}$ over $\{0,1\}^n$, runs in $\mathrm{poly}(n^{\log s},1/\varepsilon)$ time and returns a

Figures (3)

  • Figure 1: An illustration of the implications of our main result. The top axis denotes different runtimes for (weak) learning $n$-variable size-$s$ decision trees. The bottom axis denotes approximation factors for $k$-NCP. The right hand side of each axis plots the best known algorithms for each respective problem. Each arrow indicates how a decision tree learning algorithm with a particular runtime yields an algorithm for $k$-NCP with a corresponding approximation ratio.
  • Figure 2: An illustration of \ref{['thm:decision version of main result']} as a series of gap amplification steps. Starting with an instance of $k$-NCP on the left, we perform a series of transformations to obtain an instance of the distinguishing problem on the right. Due to space constraints we have omitted descriptions of the corresponding distributions from the figure. These distributions also go through a series of transformations, from $\mathrm{Unif}(D)$ on the left to $\mathrm{Unif}(\mathrm{Span}(D))_{\oplus \ell}$ on the right.
  • Figure 3: Illustration of inclusions of basic function classes

Theorems & Definitions (44)

  • Theorem : EH89
  • Theorem : BK02APY09
  • Corollary 2.1
  • Corollary 2.2
  • Theorem 1: Our reduction
  • Corollary 2.3
  • Definition 4.1: Decisional $\alpha$-approximate $k$-NCP
  • Theorem 2: \ref{['thm:most general']} for decisional approximate $k$-NCP
  • Definition 4.2: Parity check view of decisional $\alpha$-approximate $k$-NCP
  • Definition 4.3
  • ...and 34 more