Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
Sungyoon Kim, Mert Pilanci
TL;DR
The paper analyzes training regularized two-layer ReLU networks through a Gaussian-relaxed convex reformulation based on random hyperplane arrangements, proving that the relative gap between the non-convex objective $p^{*}$ and its relaxation $\tilde{p}^{*}$ scales as $O(\sqrt{\log n})$ under Gaussian data and mild conditions. It introduces a polynomial-time randomized algorithm with complexity $O(d^{3}m^{3})$ that achieves this approximation and shows that local gradient methods converge to high-quality stationary points with high probability, shedding light on their empirical effectiveness. The authors develop a duality-based and Gordon’s comparison framework, connect the analysis to cone sharpness $\mathcal{C}$, and extend the guarantees from unconstrained to constrained relaxations, with a MAX-CUT interpretation providing additional insights. Collectively, the work yields principled, scalable guarantees for convex relaxations of ReLU networks, offering theoretical explanation for SGD-like methods and a path toward tractable global-optimal approximations in polynomial time.
Abstract
In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.
