Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

Nader H. Bshouty; George Haddad

Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

Nader H. Bshouty, George Haddad

TL;DR

This work connects the difficulty of approximating the sparsity $d(f)$ of a parity function under random classification noise to the difficulty of properly learning parities. It shows that any polynomial-time $\gamma$-approximation of $d(f)$ yields a polynomial-time learner for some $k(n)=\omega_n(1)$ parities, and that any $T(n)$-time $\gamma$-approximation leads to a proper learner for parity functions in time poly$(\Gamma(n))T(\Gamma(n)^2)$ where $\Gamma(x)=\gamma(\gamma(x))$, with extensions to all linear functions over finite fields. The results generalize beyond parity to any linear function over finite fields, linking approximation hardness to learning hardness in coding-theory–cryptography–oriented settings. If the ambient time bound could be subexponential in $n$, this would resolve long-standing open questions on properly learning parities with random classification noise. Overall, the paper provides a robust framework showing that even approximate knowledge of sparsity tightly constrains learning capabilities for a central class of linear models.

Abstract

Consider the model where we can access a parity function through random uniform labeled examples in the presence of random classification noise. In this paper, we show that approximating the number of relevant variables in the parity function is as hard as properly learning parities. More specifically, let $γ:{\mathbb R}^+\to {\mathbb R}^+$, where $γ(x) \ge x$, be any strictly increasing function. In our first result, we show that from any polynomial-time algorithm that returns a $γ$-approximation, $D$ (i.e., $γ^{-1}(d(f)) \leq D \leq γ(d(f))$), of the number of relevant variables~$d(f)$ for any parity $f$, we can, in polynomial time, construct a solution to the long-standing open problem of polynomial-time learning $k(n)$-sparse parities (parities with $k(n)\le n$ relevant variables), where $k(n) = ω_n(1)$. In our second result, we show that from any $T(n)$-time algorithm that, for any parity $f$, returns a $γ$-approximation of the number of relevant variables $d(f)$ of $f$, we can, in polynomial time, construct a $poly(Γ(n))T(Γ(n)^2)$-time algorithm that properly learns parities, where $Γ(x)=γ(γ(x))$. If $T(Γ(n)^2)=\exp({o(n/\log n)})$, this would resolve another long-standing open problem of properly learning parities in the presence of random classification noise in time $\exp({o(n/\log n)})$.

Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

TL;DR

This work connects the difficulty of approximating the sparsity

of a parity function under random classification noise to the difficulty of properly learning parities. It shows that any polynomial-time

-approximation of

yields a polynomial-time learner for some

parities, and that any

-time

-approximation leads to a proper learner for parity functions in time poly

where

, with extensions to all linear functions over finite fields. The results generalize beyond parity to any linear function over finite fields, linking approximation hardness to learning hardness in coding-theory–cryptography–oriented settings. If the ambient time bound could be subexponential in

, this would resolve long-standing open questions on properly learning parities with random classification noise. Overall, the paper provides a robust framework showing that even approximate knowledge of sparsity tightly constrains learning capabilities for a central class of linear models.

Abstract

, where

, be any strictly increasing function. In our first result, we show that from any polynomial-time algorithm that returns a

-approximation,

(i.e.,

), of the number of relevant variables~

for any parity

, we can, in polynomial time, construct a solution to the long-standing open problem of polynomial-time learning

-sparse parities (parities with

relevant variables), where

. In our second result, we show that from any

-time algorithm that, for any parity

, returns a

-approximation of the number of relevant variables

, we can, in polynomial time, construct a

-time algorithm that properly learns parities, where

. If

, this would resolve another long-standing open problem of properly learning parities in the presence of random classification noise in time

Paper Structure (11 sections, 13 theorems, 12 equations)

This paper contains 11 sections, 13 theorems, 12 equations.

Introduction
Our Technique
Approximation Implies Learning $k$-Sparse Parities
First Approach
The Second Approach
Approximation Implies Learning Parities
Justification for the Use of the $\gamma$-Approximation Definition
Definitions and Preliminaries
Approximation vs. Learning
Approximation Implies Learning Some ${\rm Lin}(\mathbb{F},k)$
Approximation Implies Learning ${\rm Lin}(\mathbb{F})$

Key Result

Theorem 1

Let $\gamma:{\mathbb R}^+\to {\mathbb R}^+$ be any strictly increasing function, where $\gamma(x)\ge x$. Consider a polynomial-time algorithm that, for any parity $f$, uses random uniform labeled examples of $f$ in the presence of random classification noise and returns an integer $D$ such thatSee

Theorems & Definitions (23)

Theorem 1
Theorem 2
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Theorem 3
Lemma 4
...and 13 more

Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

TL;DR

Abstract

Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (23)