Fast decision tree learning solves hard coding-theoretic problems

Caleb Koch; Carmen Strassle; Li-Yang Tan

Fast decision tree learning solves hard coding-theoretic problems

Caleb Koch, Carmen Strassle, Li-Yang Tan

TL;DR

It is shown that any improvement of Ehrenfeucht and Haussler's algorithm will yield $O$(logn)-approximation algorithms for k-NCP, an exponential improvement of the current state of the art.

Abstract

We connect the problem of properly PAC learning decision trees to the parameterized Nearest Codeword Problem ($k$-NCP). Despite significant effort by the respective communities, algorithmic progress on both problems has been stuck: the fastest known algorithm for the former runs in quasipolynomial time (Ehrenfeucht and Haussler 1989) and the best known approximation ratio for the latter is $O(n/\log n)$ (Berman and Karpinsky 2002; Alon, Panigrahy, and Yekhanin 2009). Research on both problems has thus far proceeded independently with no known connections. We show that $\textit{any}$ improvement of Ehrenfeucht and Haussler's algorithm will yield $O(\log n)$-approximation algorithms for $k$-NCP, an exponential improvement of the current state of the art. This can be interpreted either as a new avenue for designing algorithms for $k$-NCP, or as one for establishing the optimality of Ehrenfeucht and Haussler's algorithm. Furthermore, our reduction along with existing inapproximability results for $k$-NCP already rule out polynomial-time algorithms for properly learning decision trees. A notable aspect of our hardness results is that they hold even in the setting of $\textit{weak}$ learning whereas prior ones were limited to the setting of strong learning.

Fast decision tree learning solves hard coding-theoretic problems

TL;DR

It is shown that any improvement of Ehrenfeucht and Haussler's algorithm will yield

(logn)-approximation algorithms for k-NCP, an exponential improvement of the current state of the art.

Abstract

We connect the problem of properly PAC learning decision trees to the parameterized Nearest Codeword Problem (

-NCP). Despite significant effort by the respective communities, algorithmic progress on both problems has been stuck: the fastest known algorithm for the former runs in quasipolynomial time (Ehrenfeucht and Haussler 1989) and the best known approximation ratio for the latter is

(Berman and Karpinsky 2002; Alon, Panigrahy, and Yekhanin 2009). Research on both problems has thus far proceeded independently with no known connections. We show that

improvement of Ehrenfeucht and Haussler's algorithm will yield

-approximation algorithms for

-NCP, an exponential improvement of the current state of the art. This can be interpreted either as a new avenue for designing algorithms for

-NCP, or as one for establishing the optimality of Ehrenfeucht and Haussler's algorithm. Furthermore, our reduction along with existing inapproximability results for

-NCP already rule out polynomial-time algorithms for properly learning decision trees. A notable aspect of our hardness results is that they hold even in the setting of

learning whereas prior ones were limited to the setting of strong learning.

Paper Structure (62 sections, 26 theorems, 32 equations, 3 figures, 1 table)

This paper contains 62 sections, 26 theorems, 32 equations, 3 figures, 1 table.

Introduction
Properly PAC Learning Decision Trees ( DT-Learn).
Parameterized Nearest Codeword Problem ( $k$-NCP).
Motivation for both problems
Our results
Statement of our reduction
Addressing the main open problem from EH89.
Comparison with prior work
Inverse-polynomial error.
Constant error.
Summary.
Discussion
Two interpretations of our results.
Decision trees and weak learning in practice.
LPN hardness of uniform-distribution learning?
...and 47 more sections

Key Result

Theorem 1

There is an algorithm that, given random examples $(\boldsymbol{x},f(\boldsymbol{x}))$ where $f : \{0,1\}^n \to \{0,1\}$ is a size-$s$ decision tree and $\boldsymbol{x}$ is drawn from a distribution $\mathcal{D}$ over $\{0,1\}^n$, runs in $\mathrm{poly}(n^{\log s},1/\varepsilon)$ time and returns a

Figures (3)

Figure 1: An illustration of the implications of our main result. The top axis denotes different runtimes for (weak) learning $n$-variable size-$s$ decision trees. The bottom axis denotes approximation factors for $k$-NCP. The right hand side of each axis plots the best known algorithms for each respective problem. Each arrow indicates how a decision tree learning algorithm with a particular runtime yields an algorithm for $k$-NCP with a corresponding approximation ratio.
Figure 2: An illustration of \ref{['thm:decision version of main result']} as a series of gap amplification steps. Starting with an instance of $k$-NCP on the left, we perform a series of transformations to obtain an instance of the distinguishing problem on the right. Due to space constraints we have omitted descriptions of the corresponding distributions from the figure. These distributions also go through a series of transformations, from $\mathrm{Unif}(D)$ on the left to $\mathrm{Unif}(\mathrm{Span}(D))_{\oplus \ell}$ on the right.
Figure 3: Illustration of inclusions of basic function classes

Theorems & Definitions (44)

Theorem : EH89
Theorem : BK02APY09
Corollary 2.1
Corollary 2.2
Theorem 1: Our reduction
Corollary 2.3
Definition 4.1: Decisional $\alpha$-approximate $k$-NCP
Theorem 2: \ref{['thm:most general']} for decisional approximate $k$-NCP
Definition 4.2: Parity check view of decisional $\alpha$-approximate $k$-NCP
Definition 4.3
...and 34 more

Fast decision tree learning solves hard coding-theoretic problems

TL;DR

Abstract

Fast decision tree learning solves hard coding-theoretic problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (44)