Table of Contents
Fetching ...

Unveiling Hidden Convexity in Deep Learning: a Sparse Signal Processing Perspective

Emi Zeger, Mert Pilanci

Abstract

Deep neural networks (DNNs), particularly those using Rectified Linear Unit (ReLU) activation functions, have achieved remarkable success across diverse machine learning tasks, including image recognition, audio processing, and language modeling. Despite this success, the non-convex nature of DNN loss functions complicates optimization and limits theoretical understanding. In this paper, we highlight how recently developed convex equivalences of ReLU NNs and their connections to sparse signal processing models can address the challenges of training and understanding NNs. Recent research has uncovered several hidden convexities in the loss landscapes of certain NN architectures, notably two-layer ReLU networks and other deeper or varied architectures. This paper seeks to provide an accessible and educational overview that bridges recent advances in the mathematics of deep learning with traditional signal processing, encouraging broader signal processing applications.

Unveiling Hidden Convexity in Deep Learning: a Sparse Signal Processing Perspective

Abstract

Deep neural networks (DNNs), particularly those using Rectified Linear Unit (ReLU) activation functions, have achieved remarkable success across diverse machine learning tasks, including image recognition, audio processing, and language modeling. Despite this success, the non-convex nature of DNN loss functions complicates optimization and limits theoretical understanding. In this paper, we highlight how recently developed convex equivalences of ReLU NNs and their connections to sparse signal processing models can address the challenges of training and understanding NNs. Recent research has uncovered several hidden convexities in the loss landscapes of certain NN architectures, notably two-layer ReLU networks and other deeper or varied architectures. This paper seeks to provide an accessible and educational overview that bridges recent advances in the mathematics of deep learning with traditional signal processing, encouraging broader signal processing applications.
Paper Structure (2 sections, 1 theorem, 44 equations, 6 figures, 1 table)

This paper contains 2 sections, 1 theorem, 44 equations, 6 figures, 1 table.

Key Result

Theorem 1

The non-convex training problem eq:nn with weight decay regularization for a two-layer ReLU network is equivalent to the convex group Lasso problem where $\mathcal{K}{_{(g)}}=\{\mathbf{z}{_{(g)}}: (2\mathbf{D}{_{(g)}}-\mathbf{I})\mathbf{X}^T\mathbf{z}{_{(g)}}\geq0\}$, provided the number of neurons satisfies$m \geq m^*$ where $m^*$ is the number of non-zero$\mathbf{u}{_{(g)}},\mathbf{v}{_{(g)}}$.

Figures (6)

  • Figure 1: Illustration of a separating hyperplane (left plot) and an activation chamber (right plot).
  • Figure 2: Zonotope example. Lines indicate normal cones.
  • Figure 3: Zonotope normal cones
  • Figure 4: Wedge product in geometric algebra pilancicomplexity.
  • Figure 5: Experimental results comparing $2$-layer NN training with non-convex and convex formulations.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1: pilanci2020neural