Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

Yiwen Kou; Zixiang Chen; Quanquan Gu; Sham M. Kakade

Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

TL;DR

This paper solves the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent on two-layer fully-connected neural networks, and matches the SQ lower bound for solving thek-sparse parity problem using gradient-based methods.

Abstract

The $k$-sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube ($k\leq O(\sqrt{d})$) with a sample complexity of $\tilde{O}(d^{k-1})$ using $2^{Θ(k)}$ neurons, matching the established $Ω(d^{k})$ lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the $k$-parity problem. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors. To the best of our knowledge, this is the first result that matches the SQ lower bound for solving $k$-sparse parity problem using gradient-based methods.

Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

TL;DR

This paper solves the

-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent on two-layer fully-connected neural networks, and matches the SQ lower bound for solving thek-sparse parity problem using gradient-based methods.

Abstract

The

-sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the

-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can efficiently solve the

-sparse parity problem on a

-dimensional hypercube (

) with a sample complexity of

using

neurons, matching the established

lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the

-parity problem. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the

-parity problem with small statistical errors. To the best of our knowledge, this is the first result that matches the SQ lower bound for solving

-sparse parity problem using gradient-based methods.

Paper Structure (23 sections, 28 theorems, 125 equations, 7 figures, 3 tables)

This paper contains 23 sections, 28 theorems, 125 equations, 7 figures, 3 tables.

Introduction
Our Contributions
Notation.
Related Work
XOR Problem.
$k$-parity Problem.
Problem Setup
Algorithm.
Main Results
Overview of Proof Technique
Warmup: Population Gradient Descent
Stochastic Sign Gradient Descent
Conclusion and Future Work
Experiments
Model.
...and 8 more sections

Key Result

Theorem 1.1

For a two-layer fully-connected neural networks of width $2^{\Theta(k)}$, online sign SGD with batch size $\widetilde{O}(d^{k-1})$ can find a solution to the $k$-parity problem with a small test error within $O(k\log d)$ iterations.

Figures (7)

Figure 1: The plot above illustrates the comparison between the modified sign function $\widetilde{\mathop{\mathrm{sign}}}(x) (\rho=0.5)$ and the standard sign function $\mathop{\mathrm{sign}}(x)$. The $\widetilde{\mathop{\mathrm{sign}}}(x)$ function introduces a 'dead zone' between $-\rho$ and $\rho$ where the function value is zero, which is not present in the standard sign function. This modification effectively creates a threshold effect, only outputting non-zero values when the input $x$ exceeds the specified bounds of $\rho$ in either direction.
Figure 2: Illustration of a $2$-parity $\textit{good}$ neuron with initial weights $w_{1,1}^{(0)} = 1$, $w_{1,2}^{(0)} = -1$, and $a_{1} = -1$.
Figure 3: Illustration of a $2$-parity $\textit{bad}$ neuron with initial weights $w_{1,1}^{(0)} = -1$, $w_{1,2}^{(0)} = 1$, and $a_{1} = 1$.
Figure 4: Illustration of a $3$-parity $\textit{good}$ neuron with initial weights $w_{1,1}^{(0)} = 1$, $w_{1,2}^{(0)} = 1$, $w_{1,3}^{(0)} = 1$, and $a_{1} = 1$.
Figure 5: Illustration of a $3$-parity $\textit{bad}$ neuron with initial weights $w_{1,1}^{(0)} = 1$, $w_{1,2}^{(0)} = -1$, $w_{1,3}^{(0)} = 1$, and $a_{1} = 1$.
...and 2 more figures

Theorems & Definitions (31)

Theorem 1.1: Informal
Definition 3.1: $\boldsymbol{k}$-parity
Remark 3.2
Proposition 4.1
Theorem 4.3
Remark 4.4
Lemma 5.1
Lemma 5.2
Lemma 5.3
Corollary 5.4
...and 21 more

Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

TL;DR

Abstract

Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (31)