HRP: High-Rank Preheating for Superior LoRA Initialization
Yuzhu Chen, Yingjie Wang, Shi Fu, Li Shen, Yongcheng Jing, Xinmei Tian, Dacheng Tao
TL;DR
This work shows that LoRA fine-tuning is highly sensitive to initialization and that random schemes can prevent reaching the best low-rank approximation of the target change $M=W^{\text{target}}-W^{\text{init}}$. By analyzing gradient flow for Asymmetric and Classic LoRA, the authors prove that wise initialization yields exponential convergence to the optimal rank-$r$ solution, while random initialization can trap training in suboptimal regions. They propose High-Rank Preheating (HRP), which performs several steps of high-rank LoRA to approximate the main singular directions of $M$ via the BA$^\top$ product, then uses the leading singular vectors as the main initialization; theoretical bounds show HRP improves expected loss, especially when the target has low effective rank. Empirically, HRP improves performance over other initialization strategies on NLU and NLG tasks and achieves results comparable to full-parameter fine-tuning with negligible extra memory, validating its practicality for resource-constrained fine-tuning of large models.
Abstract
This paper studies the crucial impact of initialization in Low-Rank Adaptation (LoRA). Through theoretical analysis, we demonstrate that the fine-tuned result of LoRA is highly sensitive to initialization, which is likely to lead suboptimal low-rank results. While this issue can be mitigated by adjusting the initial direction towards the main singular vectors of the target $ΔW$, which is, however, typically unknown in real-world scenarios. To approximate this initial direction, we propose High-Rank Preheating (HRP), which first trains LoRA with a higher preheating rank for a few steps, then uses the main singular vectors of the derived $BA^\top$ as initialization for the main fine-tuning process. With only a modification in the initial direction, we prove that HRP makes LoRA achieve better fine-tuned results than random initialization in expectation, and the enhancement grows with the preheating rank. We validate our theoretical findings through extensive experiments in various models and tasks, where HRP significantly enhances LoRA's effectiveness and outperforms other initialization strategies and other LoRA variants.
