OP-LoRA: The Blessing of Dimensionality
Piotr Teterwak, Kate Saenko, Bryan A. Plummer, Ser-Nam Lim
TL;DR
OP-LoRA addresses optimization fragility in low-rank adapters by overparameterizing the adapter generation with an MLP per layer. The MLP outputs the low-rank factors via $AB = W_2(\text{ReLU}(W_1 z + b_1)) + b_2$, allowing faster training while keeping inference costs unchanged. Empirically, OP-LoRA yields consistent gains across matrix-factorization proxies and large-scale tasks, including up to $\approx 15$ CMMD points in Stable Diffusion fine-tuning and improved accuracy in vision-language and LLaVA benchmarks. This work demonstrates that leveraging the blessing of dimensionality through overparameterization can significantly ease optimization for parameter-efficient fine-tuning without sacrificing deployment efficiency.
Abstract
Low-rank adapters enable fine-tuning of large models with only a small number of parameters, thus reducing storage costs and minimizing the risk of catastrophic forgetting. However, they often pose optimization challenges, with poor convergence. To overcome these challenges, we introduce an over-parameterized approach that accelerates training without increasing inference costs. This method reparameterizes low-rank adaptation by employing a separate MLP and learned embedding for each layer. The learned embedding is input to the MLP, which generates the adapter parameters. Such overparamaterization has been shown to implicitly function as an adaptive learning rate and momentum, accelerating optimization. At inference time, the MLP can be discarded, leaving behind a standard low-rank adapter. To study the effect of MLP overparameterization on a small yet difficult proxy task, we implement it for matrix factorization, and find it achieves faster convergence and lower final loss. Extending this approach to larger-scale tasks, we observe consistent performance gains across domains. We achieve improvements in vision-language tasks and especially notable increases in image generation, with CMMD scores improving by up to 15 points.
