Harnessing Orthogonality to Train Low-Rank Neural Networks

Daniel Coquelin; Katharina Flügel; Marie Weiel; Nicholas Kiefer; Charlotte Debus; Achim Streit; Markus Götz

Harnessing Orthogonality to Train Low-Rank Neural Networks

Daniel Coquelin, Katharina Flügel, Marie Weiel, Nicholas Kiefer, Charlotte Debus, Achim Streit, Markus Götz

TL;DR

OIALR training is introduced, a novel training method exploiting the intrinsic orthogonality of neural networks that seamlessly integrates into existing training workflows with minimal accuracy loss and can surpass conventional training setups, including those of state-of-the-art models.

Abstract

This study explores the learning dynamics of neural networks by analyzing the singular value decomposition (SVD) of their weights throughout training. Our investigation reveals that an orthogonal basis within each multidimensional weight's SVD representation stabilizes during training. Building upon this, we introduce Orthogonality-Informed Adaptive Low-Rank (OIALR) training, a novel training method exploiting the intrinsic orthogonality of neural networks. OIALR seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. With appropriate hyperparameter tuning, OIALR can surpass conventional training setups, including those of state-of-the-art models.

Harnessing Orthogonality to Train Low-Rank Neural Networks

TL;DR

Abstract

Paper Structure (14 sections, 10 equations, 4 figures, 9 tables, 2 algorithms)

This paper contains 14 sections, 10 equations, 4 figures, 9 tables, 2 algorithms.

Related work
Observing orthogonality in neural network training
Orthogonality-Informed Adaptive Low-Rank Training
Experiments
Computational environment
Vision Transformer on ImageNet-2012
Comparison with related low-rank and sparse training methods
Ablation study on mini ViT on CIFAR-10
Ablation study on Autoformer on ETTm2
Conclusion
Experiment Hyperparameters
ImageNet-2012
Mini-ViT on CIFAR-10
AutoFormer on ETTm2

Figures (4)

Figure 1: Analysis of the linear mixing Euclidean similarity and orthogonal basis Stability for ResNet and ViT models during ImageNet-2012 training. Stability is defined by \ref{['eq:stability']} and Euclidean similarity is defined by \ref{['eq:dist']}, higher denotes less changes between steps for both. Both metrics compare the network's current parameters with those of five epochs prior. The x-axis denotes the training epoch, and the y-axis denotes the network layer (input layers at the top). Mean stability and similarity are shown below each heatmap.
Figure 2: Training of a ViT-B/16 network on ImageNet-2012 over 125 epochs.
Figure 3: Learning rate schedules for baseline and OIALR training for a mini ViT on CIFAR-10. OIALR training learning rate schedule determined by HP search.
Figure 4: MSE and the percentage of trainable parameters relative to the full-rank model for the Autoformer trained on the ETTm2 dataset using two different prediction lengths in 15 time steps.

Harnessing Orthogonality to Train Low-Rank Neural Networks

TL;DR

Abstract

Harnessing Orthogonality to Train Low-Rank Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)