Table of Contents
Fetching ...

Algebraic Representations for Faster Predictions in Convolutional Neural Networks

Johnny Joyce, Jan Verschelde

TL;DR

This work develops algebraic representations for CNNs with skip connections to enable fast prediction-time inference. For linear CNNs, the authors prove that an arbitrarily deep, skip-connected network can be pre-computed into an affine map $f(X)=W X + B$, yielding substantial speedups by reducing inference to a single-layer-like operation. Extending to nonlinear networks, they introduce a homotopy approach that gradually removes skip connections during training, achieving measurable prediction-time gains (e.g., up to 22%–46% speedups in experiments) while preserving accuracy. Applied to ResNet34 on MNIST, these results demonstrate practical pathways to combine deep expressive models with the computational efficiency of shallow predictors, with future work aiming to unify linear and nonlinear results via algebraic-geometric tools and optimized scheduling of skip-connection strength.

Abstract

Convolutional neural networks (CNNs) are a popular choice of model for tasks in computer vision. When CNNs are made with many layers, resulting in a deep neural network, skip connections may be added to create an easier gradient optimization problem while retaining model expressiveness. In this paper, we show that arbitrarily complex, trained, linear CNNs with skip connections can be simplified into a single-layer model, resulting in greatly reduced computational requirements during prediction time. We also present a method for training nonlinear models with skip connections that are gradually removed throughout training, giving the benefits of skip connections without requiring computational overhead during during prediction time. These results are demonstrated with practical examples on Residual Networks (ResNet) architecture.

Algebraic Representations for Faster Predictions in Convolutional Neural Networks

TL;DR

This work develops algebraic representations for CNNs with skip connections to enable fast prediction-time inference. For linear CNNs, the authors prove that an arbitrarily deep, skip-connected network can be pre-computed into an affine map , yielding substantial speedups by reducing inference to a single-layer-like operation. Extending to nonlinear networks, they introduce a homotopy approach that gradually removes skip connections during training, achieving measurable prediction-time gains (e.g., up to 22%–46% speedups in experiments) while preserving accuracy. Applied to ResNet34 on MNIST, these results demonstrate practical pathways to combine deep expressive models with the computational efficiency of shallow predictors, with future work aiming to unify linear and nonlinear results via algebraic-geometric tools and optimized scheduling of skip-connection strength.

Abstract

Convolutional neural networks (CNNs) are a popular choice of model for tasks in computer vision. When CNNs are made with many layers, resulting in a deep neural network, skip connections may be added to create an easier gradient optimization problem while retaining model expressiveness. In this paper, we show that arbitrarily complex, trained, linear CNNs with skip connections can be simplified into a single-layer model, resulting in greatly reduced computational requirements during prediction time. We also present a method for training nonlinear models with skip connections that are gradually removed throughout training, giving the benefits of skip connections without requiring computational overhead during during prediction time. These results are demonstrated with practical examples on Residual Networks (ResNet) architecture.
Paper Structure (15 sections, 1 theorem, 11 equations, 3 figures)

This paper contains 15 sections, 1 theorem, 11 equations, 3 figures.

Key Result

Theorem 3.1

Let ${f_W^{(i)}(X) \coloneqq W^{(i)} P^{(i)} \left(\sum_{k=0}^{i-1} t^{(k,i-1)} R^{(k,i-1)} f^{(k)}(X) \right)}$ be the same as in (eq:skiprecursive) with all bias terms removed. Also take $\mathbb{1}_{hw}$ to be the identity matrix of size $hw \times hw$, and take $\Vec{0}$ to be the zero vector of where $f$ is the map given by a CNN with $L$ layers and arbitrarily many skip connections, and wher

Figures (3)

  • Figure 1: Classification accuracy for varying values of $t$ (skip connection strength) in ResNet34. 10 epochs were performed over the MNIST dataset in each trial, and the mean result over 5 trials was taken for each point shown in the scatter plot.
  • Figure 2: Classification accuracy for varying values of $t$ (skip connection strength) in ResNet34. 2 epochs were performed over the MNIST dataset in each trial, and the mean result over 5 trials was taken for each point shown in the scatter plot.
  • Figure 3: Classification accuracy of ResNet34 on MNIST validation set across 10 epochs using a ResNet34 model with (a) fixed $t$ (skip connection strength) compared to (b) scheduled values of $t$ that decrease to 0 (c) scheduled values starting at 0.5 that can be applied to other models. Each point shown is the mean over 5 trials.

Theorems & Definitions (2)

  • Claim 3.1
  • Theorem 3.1