GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection
Xutao Liao, Shaohui Li, Yuhui Xu, Zhi Li, Yu Liu, You He
TL;DR
GaLore$+$ tackles the time bottleneck of SVD-based low-rank projections in fine-tuning LLMs by introducing cross-head projection, which shares projection matrices across attention heads to reduce complexity, and by employing randomized subspace iteration for faster SVD. It further mitigates approximation errors through sparsely coded residuals applied to optimizer moments, with a warm-up phase to build an efficient sparse indexing scheme. Empirical results on arithmetic reasoning and natural language generation show GaLore$+$ achieves superior performance while delivering approximately 4× faster fine-tuning than vanilla GaLore. The approach thus offers a practical, memory-efficient path for high-parameter LLM adaptation with strong task performance and reduced compute requirements.
Abstract
Recent low-rank training methods, such as GaLore, have significantly reduced the memory required to optimize large language models (LLMs). However, these methods often suffer from time-consuming low-rank projection estimations. In particular, the singular value decomposition (SVD) in GaLore can consume more than 80\% of the total training time. To address this issue, we propose GaLore$+$, which uses cross-head low-rank projection to reduce the substantial time consumption in estimating low-rank projections for multi-head attention. In addition, we employ randomized subspace iteration to achieve fast SVD. To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates. We evaluate GaLore$+$ on arithmetic reasoning and natural language generation datasets. Our experiments demonstrate that GaLore$+$ delivers superior performance while achieving approximately $4\times$ fine-tuning speed compared to vanilla GaLore.
