Convex Formulations for Training Two-Layer ReLU Neural Networks
Karthik Prakhya, Tolga Birdal, Alp Yurtsever
TL;DR
The paper addresses the non-convexity of training two-layer ReLU neural networks by formulating the problem as a convex copositive program, proving exact equivalence with the original problem at a finite critical width and extending to infinite width via a measure-based representation. To enable practical computation, it introduces a polynomial-time SDP relaxation that replaces the copositive constraint with a doubly non-negative cone while preserving the problem's core structure. Empirical results on synthetic and real datasets show the SDP relaxation is reasonably tight and yields competitive test accuracy compared to Neural Network Gaussian Processes and Neural Tangent Kernels, with a rounding step enabling extraction of trained weights. The work provides a principled lifting-based, width-agnostic framework for neural network training, offering insights into the role of width, the potential of convex relaxations, and avenues for future quantum or rank-bound approaches to copositive programs.
Abstract
Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex formulations have been used for verifying neural network robustness, their application to training neural networks remains less explored. In response to this challenge, we reformulate the problem of training infinite-width two-layer ReLU networks as a convex completely positive program in a finite-dimensional (lifted) space. Despite the convexity, solving this problem remains NP-hard due to the complete positivity constraint. To overcome this challenge, we introduce a semidefinite relaxation that can be solved in polynomial time. We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks.
