Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao, Yanshuai Cao, Lili Mou
TL;DR
Flora tackles the memory bottleneck in training large neural networks by reframing LoRA as gradient compression via random projections and introducing a sublinear-memory mechanism. By repeatedly resampling projection matrices, Flora enables high-rank updates while keeping optimization-state memory near sublinear in model size. Empirical results across summarization and translation tasks show Flora matches or closely approaches full-matrix updates with substantial memory savings, outperforming LoRA in many settings. The approach is complementary to existing memory-saving techniques and scalable to various architectures, offering practical benefits for training larger models efficiently.
Abstract
Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model performance. In this work, we investigate the dynamics of LoRA and identify that it can be approximated by a random projection. Based on this observation, we propose Flora, which is able to achieve high-rank updates by resampling the projection matrices while enjoying the sublinear space complexity of optimization states. We conduct experiments across different tasks and model architectures to verify the effectiveness of our approach.
