Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training
Hyuntak Shin, Aecheon Jung, Sungeun Hong, Sunwoo Lee
TL;DR
This work tackles the rank-collapse problem in low-rank reparameterized neural networks by introducing a dynamic-rank training framework that intermittently inflates the weight rank with full-rank epochs and later deflates it via a trainable low-rank adaptor. The key idea is to interleave full-rank training in both high-noise and low-noise phases—guided by learning-rate schedules—to restore the effective rank and recover representational capacity without incurring full-cost training. Empirical results across computer vision and NLP benchmarks show dynamic-rank training achieving accuracy close to full-rank methods while maintaining computational costs comparable to conventional low-rank approaches, and it remains compatible with various decompositions and regularizers. The framework is presented as general, practical, and extensible, with detailed guidance on schedule choices and potential future automation and integration with other efficiency techniques.
Abstract
Low-rank training methods reduce the number of trainable parameters by re-parameterizing the weights with matrix decompositions (e.g., singular value decomposition). However, enforcing a fixed low-rank structure caps the rank of the weight matrices and can hinder the model's ability to learn complex patterns. Furthermore, the effective rank of the model's weights tends to decline during training, and this drop is accelerated when the model is reparameterized into a low-rank structure. In this study, we argue that strategically interleaving full-rank training epochs within low-rank training epochs can effectively restore the rank of the model's weights. Based on our findings, we propose a general dynamic-rank training framework that is readily applicable to a wide range of neural-network tasks. We first describe how to adjust the rank of weight matrix to alleviate the inevitable rank collapse that arises during training, and then present extensive empirical results that validate our claims and demonstrate the efficacy of the proposed framework. Our empirical study shows that the proposed method achieves almost the same computational cost as SVD-based low-rank training while achieving a comparable accuracy to full-rank training across various benchmarks.
