Table of Contents
Fetching ...

Learning One Convolutional Layer with Overlapping Patches

Surbhi Goel, Adam Klivans, Raghu Meka

TL;DR

The paper tackles the challenge of provably learning a one-hidden-layer convolutional network with overlapping patches under mild distributional assumptions. It introduces Convotron, a stochastic, isotonic-regression-inspired update that converges to the true weight without special initialization or learning-rate tuning and tolerate noise. The authors establish spectral conditions for patch structures (1D and 2D patch–stride) that guarantee polynomial-time learnability, derive explicit eigenvalue bounds, and provide corresponding convergence guarantees. Empirical results show Convotron's robustness and reduced need for hyperparameter tuning compared to SGD, highlighting its practical potential for structured convolutional architectures.

Abstract

We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches. Additionally, our algorithm requires only mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional "patch and stride" convolutions. Our algorithm-- $Convotron$ -- is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just $one$ disjoint patch $P$ (the other patches may be arbitrary) is $easy$ in the following sense: Convotron can efficiently recover the hidden weight vector by updating $only$ in the direction of $P$.

Learning One Convolutional Layer with Overlapping Patches

TL;DR

The paper tackles the challenge of provably learning a one-hidden-layer convolutional network with overlapping patches under mild distributional assumptions. It introduces Convotron, a stochastic, isotonic-regression-inspired update that converges to the true weight without special initialization or learning-rate tuning and tolerate noise. The authors establish spectral conditions for patch structures (1D and 2D patch–stride) that guarantee polynomial-time learnability, derive explicit eigenvalue bounds, and provide corresponding convergence guarantees. Empirical results show Convotron's robustness and reduced need for hyperparameter tuning compared to SGD, highlighting its practical potential for structured convolutional architectures.

Abstract

We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches. Additionally, our algorithm requires only mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional "patch and stride" convolutions. Our algorithm-- -- is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just disjoint patch (the other patches may be arbitrary) is in the following sense: Convotron can efficiently recover the hidden weight vector by updating in the direction of .

Paper Structure

This paper contains 22 sections, 16 theorems, 30 equations, 4 figures, 3 algorithms.

Key Result

Lemma 2.1

For all $a,b \in \mathbb{R}$,

Figures (4)

  • Figure 1: Architecture of convolutional network with one hidden layer and average pooling. Each purple rectangle corresponds to a patch.
  • Figure 2: 2D convolution patches for image size $n_1 = n_2 = 7$, patch size $r_1 = r_2 = 3$, and stride $d_1 = 2$, $d_2 = 1$. Blue box corresponds to patch $(1,1)$, red to patch $(2, 1)$ green to patch $(1,2)$ and orange to patch $(3, 4)$.
  • Figure 3: Failure probability of SGD (green) vs Convotron (blue) with varying learning rate $\eta$. Experiment 1: Patch and stride 1D (Top-left) and 2D (Top-right). Experiment 2: Input distribution has mean 0 and covariance matrix identity (Bottom-left) and non-identity covariance matrix (Bottom-right). The curves are shifted due to scaling difference of updates.
  • Figure 4: $P^{-1}$ for $d = 1$. Here $\alpha = \beta + 0.5$ and $\beta = \frac{0.5}{2k - p} = \frac{0.5}{2n - 3r + 3}$. The shaded area is all 0s.

Theorems & Definitions (28)

  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Theorem 1: weisstein2003gershgorin
  • Theorem 2
  • proof
  • Corollary 1
  • proof
  • Lemma 4.1
  • proof
  • ...and 18 more