LoRA Training in the NTK Regime has No Spurious Local Minima
Uijeong Jang, Jason D. Lee, Ernest K. Ryu
TL;DR
This work analyzes LoRA fine-tuning of pretrained transformers within the NTK (lazy) regime, proving that full fine-tuning admits a low-rank solution with $rac{r(r+1)}{2}\le KN$ and that LoRA with rank $rac{r(r+1)}{2}> KN$ eliminates spurious local minima, enabling gradient-based methods to find low-rank solutions. It further shows that the obtained low-rank LoRA solution generalizes well via Rademacher-based bounds, and validates the theory with experiments across NLP, image, and speech tasks, observing convergence to a global optimum and rank-dependent training dynamics. The results provide upper-bound guarantees on trainability and generalization, connecting LoRA’s practical effectiveness to NTK-based optimization geometry and matrix-factorization techniques. Overall, the paper offers a principled explanation for LoRA’s success and motivates future work on refined rank bounds, local analysis beyond NTK, and computational tradeoffs in rank selection.
Abstract
Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with $N$ data points, showing: (i) full fine-tuning (without LoRA) admits a low-rank solution of rank $r\lesssim \sqrt{N}$; (ii) using LoRA with rank $r\gtrsim \sqrt{N}$ eliminates spurious local minima, allowing gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.
