Table of Contents
Fetching ...

Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training

Hyuntak Shin, Aecheon Jung, Sungeun Hong, Sunwoo Lee

TL;DR

This work tackles the rank-collapse problem in low-rank reparameterized neural networks by introducing a dynamic-rank training framework that intermittently inflates the weight rank with full-rank epochs and later deflates it via a trainable low-rank adaptor. The key idea is to interleave full-rank training in both high-noise and low-noise phases—guided by learning-rate schedules—to restore the effective rank and recover representational capacity without incurring full-cost training. Empirical results across computer vision and NLP benchmarks show dynamic-rank training achieving accuracy close to full-rank methods while maintaining computational costs comparable to conventional low-rank approaches, and it remains compatible with various decompositions and regularizers. The framework is presented as general, practical, and extensible, with detailed guidance on schedule choices and potential future automation and integration with other efficiency techniques.

Abstract

Low-rank training methods reduce the number of trainable parameters by re-parameterizing the weights with matrix decompositions (e.g., singular value decomposition). However, enforcing a fixed low-rank structure caps the rank of the weight matrices and can hinder the model's ability to learn complex patterns. Furthermore, the effective rank of the model's weights tends to decline during training, and this drop is accelerated when the model is reparameterized into a low-rank structure. In this study, we argue that strategically interleaving full-rank training epochs within low-rank training epochs can effectively restore the rank of the model's weights. Based on our findings, we propose a general dynamic-rank training framework that is readily applicable to a wide range of neural-network tasks. We first describe how to adjust the rank of weight matrix to alleviate the inevitable rank collapse that arises during training, and then present extensive empirical results that validate our claims and demonstrate the efficacy of the proposed framework. Our empirical study shows that the proposed method achieves almost the same computational cost as SVD-based low-rank training while achieving a comparable accuracy to full-rank training across various benchmarks.

Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training

TL;DR

This work tackles the rank-collapse problem in low-rank reparameterized neural networks by introducing a dynamic-rank training framework that intermittently inflates the weight rank with full-rank epochs and later deflates it via a trainable low-rank adaptor. The key idea is to interleave full-rank training in both high-noise and low-noise phases—guided by learning-rate schedules—to restore the effective rank and recover representational capacity without incurring full-cost training. Empirical results across computer vision and NLP benchmarks show dynamic-rank training achieving accuracy close to full-rank methods while maintaining computational costs comparable to conventional low-rank approaches, and it remains compatible with various decompositions and regularizers. The framework is presented as general, practical, and extensible, with detailed guidance on schedule choices and potential future automation and integration with other efficiency techniques.

Abstract

Low-rank training methods reduce the number of trainable parameters by re-parameterizing the weights with matrix decompositions (e.g., singular value decomposition). However, enforcing a fixed low-rank structure caps the rank of the weight matrices and can hinder the model's ability to learn complex patterns. Furthermore, the effective rank of the model's weights tends to decline during training, and this drop is accelerated when the model is reparameterized into a low-rank structure. In this study, we argue that strategically interleaving full-rank training epochs within low-rank training epochs can effectively restore the rank of the model's weights. Based on our findings, we propose a general dynamic-rank training framework that is readily applicable to a wide range of neural-network tasks. We first describe how to adjust the rank of weight matrix to alleviate the inevitable rank collapse that arises during training, and then present extensive empirical results that validate our claims and demonstrate the efficacy of the proposed framework. Our empirical study shows that the proposed method achieves almost the same computational cost as SVD-based low-rank training while achieving a comparable accuracy to full-rank training across various benchmarks.

Paper Structure

This paper contains 20 sections, 10 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of the layer-wise singular-value spectral ratio ($\lambda$) across different model ranks during ResNet20 training on CIFAR-10. The left plot shows layer-wise $\lambda$ curves for full-rank training, while the right plot shows those for SVD-based low-rank training. We omit the legend since there are too many layers. Throughout the whole training, most layers in the re-parameterized model exhibit large $\lambda$ values, indicating convergence to a low-rank space.
  • Figure 2: A schematic illustration of dynamic-rank training framework.
  • Figure 3: CIFAR-10 (ResNet20) benchmark with various dynamic-rank schedules. Inflate and Deflate indicate the epoch where the model rank is increased and decreased, respectively. The best accuracy is achieved when the full-rank epochs are located in both high-noise and low-noise regimes.
  • Figure 4: Parameter comparison between low-rank and dynamic-rank trainings. The heatmap shows the weights of the largest convolution layer in ResNet20 after training.
  • Figure 5: The singular value spectrum ratio $\lambda$ comparison. The red dotted lines indicate the epoch where the rank of model weights are adjusted.

Theorems & Definitions (2)

  • proof
  • proof