Table of Contents
Fetching ...

OP-LoRA: The Blessing of Dimensionality

Piotr Teterwak, Kate Saenko, Bryan A. Plummer, Ser-Nam Lim

TL;DR

OP-LoRA addresses optimization fragility in low-rank adapters by overparameterizing the adapter generation with an MLP per layer. The MLP outputs the low-rank factors via $AB = W_2(\text{ReLU}(W_1 z + b_1)) + b_2$, allowing faster training while keeping inference costs unchanged. Empirically, OP-LoRA yields consistent gains across matrix-factorization proxies and large-scale tasks, including up to $\approx 15$ CMMD points in Stable Diffusion fine-tuning and improved accuracy in vision-language and LLaVA benchmarks. This work demonstrates that leveraging the blessing of dimensionality through overparameterization can significantly ease optimization for parameter-efficient fine-tuning without sacrificing deployment efficiency.

Abstract

Low-rank adapters enable fine-tuning of large models with only a small number of parameters, thus reducing storage costs and minimizing the risk of catastrophic forgetting. However, they often pose optimization challenges, with poor convergence. To overcome these challenges, we introduce an over-parameterized approach that accelerates training without increasing inference costs. This method reparameterizes low-rank adaptation by employing a separate MLP and learned embedding for each layer. The learned embedding is input to the MLP, which generates the adapter parameters. Such overparamaterization has been shown to implicitly function as an adaptive learning rate and momentum, accelerating optimization. At inference time, the MLP can be discarded, leaving behind a standard low-rank adapter. To study the effect of MLP overparameterization on a small yet difficult proxy task, we implement it for matrix factorization, and find it achieves faster convergence and lower final loss. Extending this approach to larger-scale tasks, we observe consistent performance gains across domains. We achieve improvements in vision-language tasks and especially notable increases in image generation, with CMMD scores improving by up to 15 points.

OP-LoRA: The Blessing of Dimensionality

TL;DR

OP-LoRA addresses optimization fragility in low-rank adapters by overparameterizing the adapter generation with an MLP per layer. The MLP outputs the low-rank factors via , allowing faster training while keeping inference costs unchanged. Empirically, OP-LoRA yields consistent gains across matrix-factorization proxies and large-scale tasks, including up to CMMD points in Stable Diffusion fine-tuning and improved accuracy in vision-language and LLaVA benchmarks. This work demonstrates that leveraging the blessing of dimensionality through overparameterization can significantly ease optimization for parameter-efficient fine-tuning without sacrificing deployment efficiency.

Abstract

Low-rank adapters enable fine-tuning of large models with only a small number of parameters, thus reducing storage costs and minimizing the risk of catastrophic forgetting. However, they often pose optimization challenges, with poor convergence. To overcome these challenges, we introduce an over-parameterized approach that accelerates training without increasing inference costs. This method reparameterizes low-rank adaptation by employing a separate MLP and learned embedding for each layer. The learned embedding is input to the MLP, which generates the adapter parameters. Such overparamaterization has been shown to implicitly function as an adaptive learning rate and momentum, accelerating optimization. At inference time, the MLP can be discarded, leaving behind a standard low-rank adapter. To study the effect of MLP overparameterization on a small yet difficult proxy task, we implement it for matrix factorization, and find it achieves faster convergence and lower final loss. Extending this approach to larger-scale tasks, we observe consistent performance gains across domains. We achieve improvements in vision-language tasks and especially notable increases in image generation, with CMMD scores improving by up to 15 points.

Paper Structure

This paper contains 18 sections, 18 equations, 23 figures, 8 tables.

Figures (23)

  • Figure 1: Overparameterizing low-rank adapters at training time improves performance. (a) A standard Low-Rank Adapter learns two rank-reduced matrices (A and B) that are added to the frozen base weights. (b) Our proposed OP-LoRA predicts the adapter weights from an MLP and a learned embedding. (c) Visual results of image generation on Stable Diffusion XL, showing qualitative improvements in generated images using our method compared to standard LoRA, in addition to being more faithful to the text prompt. (d) Performance in vision-language tasks, with OP-LoRA showing accuracy gains over LoRA across training epochs.
  • Figure 2: Comparison of Mean Squared Error (MSE) loss convergence between matrix factorization with and without MLP-reparameterization. Different learning rates are represented by a blue-to-light blue gradient for MLP models and an orange-to-light orange gradient for standard models. The red line represents the SVD solution. Four MLP solutions reach the reconstruction error achieved with SVD, while none of the standard ones do.
  • Figure 3: Loss Curve
  • Figure 4: Gradient Consistency
  • Figure 5: Gradient Norm
  • ...and 18 more figures