Learning Halfspaces and Neural Networks with Random Initialization

Yuchen Zhang; Jason D. Lee; Martin J. Wainwright; Michael I. Jordan

Learning Halfspaces and Neural Networks with Random Initialization

Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, Michael I. Jordan

TL;DR

The paper analyzes non-convex empirical risk minimization for learning halfspaces and multilayer neural networks with Lipschitz losses, showing that randomized initializations followed by simple optimization steps yield ε-excess risk in time polynomial in n and d but exponential in (L/ε^2) log(L/ε). It proves fundamental hardness results that prevent polynomial-time improvements in the ε-dependence in general, while also delivering positive results: agnostic learning for networks with modest complexity, and a BoostNet boosting-based method that efficiently learns networks under constant-margin separability (with exponential dependence on 1/γ). A simulation study on parity functions demonstrates practical advantages of BoostNet over standard backpropagation in challenging noisy settings. Overall, the work delineates both the capabilities and limitations of non-convex ERM approaches for learning halfspaces and deep nets, linking complexity-theoretic barriers to margin-based learnability and providing concrete, implementable algorithms for structured data scenarios.

Abstract

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are $L$-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk $ε>0$. The time complexity is polynomial in the input dimension $d$ and the sample size $n$, but exponential in the quantity $(L/ε^2)\log(L/ε)$. These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin $γ>0$, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin $Ω(γ)$. As a consequence, the algorithm achieves arbitrary generalization error $ε>0$ with ${\rm poly}(d,1/ε)$ sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability $η<1/2$.

Learning Halfspaces and Neural Networks with Random Initialization

TL;DR

Abstract

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are

-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk

. The time complexity is polynomial in the input dimension

and the sample size

, but exponential in the quantity

. These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin

, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin

. As a consequence, the algorithm achieves arbitrary generalization error

with

sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability

Learning Halfspaces and Neural Networks with Random Initialization

TL;DR

Abstract

Learning Halfspaces and Neural Networks with Random Initialization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)