Table of Contents
Fetching ...

Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

TL;DR

This paper solves the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent on two-layer fully-connected neural networks, and matches the SQ lower bound for solving thek-sparse parity problem using gradient-based methods.

Abstract

The $k$-sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube ($k\leq O(\sqrt{d})$) with a sample complexity of $\tilde{O}(d^{k-1})$ using $2^{Θ(k)}$ neurons, matching the established $Ω(d^{k})$ lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the $k$-parity problem. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors. To the best of our knowledge, this is the first result that matches the SQ lower bound for solving $k$-sparse parity problem using gradient-based methods.

Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent

TL;DR

This paper solves the -sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent on two-layer fully-connected neural networks, and matches the SQ lower bound for solving thek-sparse parity problem using gradient-based methods.

Abstract

The -sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the -sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can efficiently solve the -sparse parity problem on a -dimensional hypercube () with a sample complexity of using neurons, matching the established lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the -parity problem. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the -parity problem with small statistical errors. To the best of our knowledge, this is the first result that matches the SQ lower bound for solving -sparse parity problem using gradient-based methods.
Paper Structure (23 sections, 28 theorems, 125 equations, 7 figures, 3 tables)

This paper contains 23 sections, 28 theorems, 125 equations, 7 figures, 3 tables.

Key Result

Theorem 1.1

For a two-layer fully-connected neural networks of width $2^{\Theta(k)}$, online sign SGD with batch size $\widetilde{O}(d^{k-1})$ can find a solution to the $k$-parity problem with a small test error within $O(k\log d)$ iterations.

Figures (7)

  • Figure 1: The plot above illustrates the comparison between the modified sign function $\widetilde{\mathop{\mathrm{sign}}}(x) (\rho=0.5)$ and the standard sign function $\mathop{\mathrm{sign}}(x)$. The $\widetilde{\mathop{\mathrm{sign}}}(x)$ function introduces a 'dead zone' between $-\rho$ and $\rho$ where the function value is zero, which is not present in the standard sign function. This modification effectively creates a threshold effect, only outputting non-zero values when the input $x$ exceeds the specified bounds of $\rho$ in either direction.
  • Figure 2: Illustration of a $2$-parity $\textit{good}$ neuron with initial weights $w_{1,1}^{(0)} = 1$, $w_{1,2}^{(0)} = -1$, and $a_{1} = -1$.
  • Figure 3: Illustration of a $2$-parity $\textit{bad}$ neuron with initial weights $w_{1,1}^{(0)} = -1$, $w_{1,2}^{(0)} = 1$, and $a_{1} = 1$.
  • Figure 4: Illustration of a $3$-parity $\textit{good}$ neuron with initial weights $w_{1,1}^{(0)} = 1$, $w_{1,2}^{(0)} = 1$, $w_{1,3}^{(0)} = 1$, and $a_{1} = 1$.
  • Figure 5: Illustration of a $3$-parity $\textit{bad}$ neuron with initial weights $w_{1,1}^{(0)} = 1$, $w_{1,2}^{(0)} = -1$, $w_{1,3}^{(0)} = 1$, and $a_{1} = 1$.
  • ...and 2 more figures

Theorems & Definitions (31)

  • Theorem 1.1: Informal
  • Definition 3.1: $\boldsymbol{k}$-parity
  • Remark 3.2
  • Proposition 4.1
  • Theorem 4.3
  • Remark 4.4
  • Lemma 5.1
  • Lemma 5.2
  • Lemma 5.3
  • Corollary 5.4
  • ...and 21 more