Table of Contents
Fetching ...

CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

Miria Feng, Zachary Frangella, Mert Pilanci

TL;DR

The theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions and shows that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb.

Abstract

We introduce the CRONOS algorithm for convex optimization of two-layer neural networks. CRONOS is the first algorithm capable of scaling to high-dimensional datasets such as ImageNet, which are ubiquitous in modern deep learning. This significantly improves upon prior work, which has been restricted to downsampled versions of MNIST and CIFAR-10. Taking CRONOS as a primitive, we then develop a new algorithm called CRONOS-AM, which combines CRONOS with alternating minimization, to obtain an algorithm capable of training multi-layer networks with arbitrary architectures. Our theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions. In addition, we validate the efficacy of CRONOS and CRONOS-AM through extensive large-scale numerical experiments with GPU acceleration in JAX. Our results show that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb. To the best of our knowledge, CRONOS is the first algorithm which utilizes the convex reformulation to enhance performance on large-scale learning tasks.

CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

TL;DR

The theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions and shows that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb.

Abstract

We introduce the CRONOS algorithm for convex optimization of two-layer neural networks. CRONOS is the first algorithm capable of scaling to high-dimensional datasets such as ImageNet, which are ubiquitous in modern deep learning. This significantly improves upon prior work, which has been restricted to downsampled versions of MNIST and CIFAR-10. Taking CRONOS as a primitive, we then develop a new algorithm called CRONOS-AM, which combines CRONOS with alternating minimization, to obtain an algorithm capable of training multi-layer networks with arbitrary architectures. Our theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions. In addition, we validate the efficacy of CRONOS and CRONOS-AM through extensive large-scale numerical experiments with GPU acceleration in JAX. Our results show that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb. To the best of our knowledge, CRONOS is the first algorithm which utilizes the convex reformulation to enhance performance on large-scale learning tasks.

Paper Structure

This paper contains 38 sections, 6 theorems, 39 equations, 8 figures, 5 tables, 6 algorithms.

Key Result

Proposition 3.1

Define the matrices $F_i = D_i X$ and $G_i = (2D_i-I)X$, where $i\in [P]$. Then by introducing the constraints $u_i = v_i$, $z_i = w_i$, where $i\in [P]$, and appropriate slack variables $s_1,\dots,s_P, t_1,\dots,t_P$, eq:mlp_cvx can be reformulated as: where

Figures (8)

  • Figure 1: CRONOS-AM vs. competitors on Deep ReLU MLP
  • Figure 2: CRONOS vs. AdamW on two GPT2 configurations for IMDb
  • Figure 3: Results for ImageNet-171 and Food-5k
  • Figure 4: Training a CNN on ImageNet-171
  • Figure 5: CRONOS-AM vs. competitors on Deep ReLU MLP (Seed 2)
  • ...and 3 more figures

Theorems & Definitions (9)

  • Proposition 3.1
  • Proposition 6.1: $F_i$ and $G_i$ are approximately low-rank if $X$ is
  • Proposition 6.3: Fast solution of $u$-subproblem
  • Theorem 6.4: Convergence and Computational Complexity of CRONOS
  • proof
  • Lemma C.1: Effective dimension under polynomial decay.
  • proof
  • Theorem C.2: Simplified Theorem 1, frangella2023linear
  • proof