Table of Contents
Fetching ...

Harnessing Orthogonality to Train Low-Rank Neural Networks

Daniel Coquelin, Katharina Flügel, Marie Weiel, Nicholas Kiefer, Charlotte Debus, Achim Streit, Markus Götz

TL;DR

OIALR training is introduced, a novel training method exploiting the intrinsic orthogonality of neural networks that seamlessly integrates into existing training workflows with minimal accuracy loss and can surpass conventional training setups, including those of state-of-the-art models.

Abstract

This study explores the learning dynamics of neural networks by analyzing the singular value decomposition (SVD) of their weights throughout training. Our investigation reveals that an orthogonal basis within each multidimensional weight's SVD representation stabilizes during training. Building upon this, we introduce Orthogonality-Informed Adaptive Low-Rank (OIALR) training, a novel training method exploiting the intrinsic orthogonality of neural networks. OIALR seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. With appropriate hyperparameter tuning, OIALR can surpass conventional training setups, including those of state-of-the-art models.

Harnessing Orthogonality to Train Low-Rank Neural Networks

TL;DR

OIALR training is introduced, a novel training method exploiting the intrinsic orthogonality of neural networks that seamlessly integrates into existing training workflows with minimal accuracy loss and can surpass conventional training setups, including those of state-of-the-art models.

Abstract

This study explores the learning dynamics of neural networks by analyzing the singular value decomposition (SVD) of their weights throughout training. Our investigation reveals that an orthogonal basis within each multidimensional weight's SVD representation stabilizes during training. Building upon this, we introduce Orthogonality-Informed Adaptive Low-Rank (OIALR) training, a novel training method exploiting the intrinsic orthogonality of neural networks. OIALR seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. With appropriate hyperparameter tuning, OIALR can surpass conventional training setups, including those of state-of-the-art models.
Paper Structure (14 sections, 10 equations, 4 figures, 9 tables, 2 algorithms)

This paper contains 14 sections, 10 equations, 4 figures, 9 tables, 2 algorithms.

Figures (4)

  • Figure 1: Analysis of the linear mixing Euclidean similarity and orthogonal basis Stability for ResNet and ViT models during ImageNet-2012 training. Stability is defined by \ref{['eq:stability']} and Euclidean similarity is defined by \ref{['eq:dist']}, higher denotes less changes between steps for both. Both metrics compare the network's current parameters with those of five epochs prior. The x-axis denotes the training epoch, and the y-axis denotes the network layer (input layers at the top). Mean stability and similarity are shown below each heatmap.
  • Figure 2: Training of a ViT-B/16 network on ImageNet-2012 over 125 epochs.
  • Figure 3: Learning rate schedules for baseline and OIALR training for a mini ViT on CIFAR-10. OIALR training learning rate schedule determined by HP search.
  • Figure 4: MSE and the percentage of trainable parameters relative to the full-rank model for the Autoformer trained on the ETTm2 dataset using two different prediction lengths in 15 time steps.