Efficient Deep Learning with Decorrelated Backpropagation

Sander Dalm; Joshua Offergeld; Nasir Ahmad; Marcel van Gerven

Efficient Deep Learning with Decorrelated Backpropagation

Sander Dalm, Joshua Offergeld, Nasir Ahmad, Marcel van Gerven

TL;DR

The paper tackles the high energy cost of training deep neural networks by introducing decorrelated backpropagation (DBP), which enforces decorrelation of layer inputs via per-layer decorrelation matrices. DBP couples a decorrelation loss with the standard BP objective, deriving an efficient update rule for the decorrelation matrices and integrating patch-wise, CNN-friendly decorrelation to scale to modern architectures. Empirical results on ImageNet across AlexNet and ResNet variants show that DBP can substantially shorten wall-clock training time (up to ~50%) while delivering higher test accuracy, and it also reduces estimated carbon emissions. The findings indicate that decorrelation not only improves convergence speed but can be practically beneficial at scale, though they require careful tuning of hyperparameters and consideration of computational overhead; future work may explore sparser or low-rank decorrelation and broader architectural domains.

Abstract

The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of deep convolutional neural networks is feasible by embracing decorrelated backpropagation as a mechanism for learning. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we achieve a more than two-fold speed-up and higher test accuracy compared to backpropagation when training several deep residual networks. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.

Efficient Deep Learning with Decorrelated Backpropagation

TL;DR

Abstract

Paper Structure (21 sections, 9 equations, 11 figures)

This paper contains 21 sections, 9 equations, 11 figures.

Introduction
Methods
Decorrelated backpropagation
Decorrelation learning rule
Decorrelating deep convolutional neural networks
Decorrelating convolutional layers
Subsampling during covariance estimation
Efficient matrix products
Experimental validation
Results
DBP effectively decorrelates inputs to all network layers
DBP converges much faster than BP
DBP training yields shorter wall-clock times
DBP reduces carbon emission
Impact of whitening
...and 6 more sections

Figures (11)

Figure 1: Demonstration of the decorrelation rule on correlated input data consisting of 1000 examples and two covariates with decorrelation learning rate $\epsilon = 0.001$. a) Decorrelation using $\kappa=0$. b) Whitening using $\kappa=0.5$. Mean variance and covariance reported for different iterations.
Figure 2: Implementation of decorrelated backpropagation in residual networks. a) Residual blocks as implemented by our networks. b) Patch-wise flattening of the inputs with a flattened dimension of $d=M \times M \times C_\text{in}$. c) Decorrelating/whitening transform of the data by decorrelation matrix $\vb{R}$. d) 1 $\times$ 1 convolution operation with the weights $\vb{W}$ on the decorrelated input patches.
Figure 3: Input decorrelation when training a ResNet18 model on ImageNet for 50 epochs. Network layers are ordered from left to right and from top to bottom. Panel titles indicate the layer type.
Figure 4: Train and test performance of BP and DBP on Imagenet as a function of epochs for different deep neural network architectures. AlexNet was trained for 20 epochs and the deeper ResNet models were trained for 35 epochs. a) AlexNet. b) ResNet18. c) ResNet34. d) ResNet50.
Figure 5: Train and test performance of BP and DBP on Imagenet as a function of wall-clock time for different deep neural network architectures. AlexNet was trained for 20 epochs and the deeper ResNet models were trained for 35 epochs. a) AlexNet. b) ResNet18. c) ResNet34. d) ResNet50.
...and 6 more figures

Efficient Deep Learning with Decorrelated Backpropagation

TL;DR

Abstract

Efficient Deep Learning with Decorrelated Backpropagation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)