Table of Contents
Fetching ...

Curvature-Guided LoRA: Steering in the pretrained NTK subspace

Frédéric Zheng, Alexandre Proutière

Abstract

Parameter-efficient fine-tuning methods such as LoRA enable efficient adaptation of large pretrained models but often fall short of full fine-tuning performance. Existing approaches focus on aligning parameter updates, which only indirectly control model predictions. In this work, we introduce the prediction alignment problem, aiming to match the predictor obtained via PEFT to that of full fine-tuning at the level of outputs. We show that this objective naturally leads to a curvature-aware, second-order formulation, where optimal low-rank updates correspond to a Newton-like, curvature-whitened gradient. Based on this insight, we propose Curvature-Guided LoRA (CG-LoRA), which selects and scales adaptation directions using local curvature information. Our method is computationally efficient and avoids explicit second-order matrix construction. Preliminary experiments on standard natural language understanding benchmarks demonstrate improved performance and faster convergence compared to existing LoRA variants.

Curvature-Guided LoRA: Steering in the pretrained NTK subspace

Abstract

Parameter-efficient fine-tuning methods such as LoRA enable efficient adaptation of large pretrained models but often fall short of full fine-tuning performance. Existing approaches focus on aligning parameter updates, which only indirectly control model predictions. In this work, we introduce the prediction alignment problem, aiming to match the predictor obtained via PEFT to that of full fine-tuning at the level of outputs. We show that this objective naturally leads to a curvature-aware, second-order formulation, where optimal low-rank updates correspond to a Newton-like, curvature-whitened gradient. Based on this insight, we propose Curvature-Guided LoRA (CG-LoRA), which selects and scales adaptation directions using local curvature information. Our method is computationally efficient and avoids explicit second-order matrix construction. Preliminary experiments on standard natural language understanding benchmarks demonstrate improved performance and faster convergence compared to existing LoRA variants.

Paper Structure

This paper contains 39 sections, 13 theorems, 83 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Proposition 4.1

Consider the squared loss $\ell(f(x), y)=\frac{1}{2}\Vert y-f(x)\Vert_2^2$. Let $Z=Y-f(W_0;X)$ denote the squared-loss finetuning residuals. (i) Full Fine-tuning (FFT). The minimum-norm solution ${\Delta W}^\star = \arg\min_{\Delta W}\,\mathcal{L}_n\left(f(W_0;X)+J_{f,W_0,X}\operatorname{vec}(\Delta (ii) LoRA. Define the LoRA Jacobian as $J_{f,W_0,A_0,B_0,X} = J_{f,W_0,X}(I_{d_\mathrm{in}}\otimes

Figures (4)

  • Figure 1: Geometric illustration of Algorithm \ref{['alg:cg-lora']} for CG-LoRA
  • Figure 2: Performance of finetuned RoBERTa-base on CoLA.
  • Figure 3: Evaluation accuracy of finetuned RoBERTa-base on CoLA w.r.t learning rate
  • Figure 4: Performance of finetuned T5-base on CoLA.

Theorems & Definitions (30)

  • Remark 3.1
  • Remark 3.2
  • Proposition 4.1
  • Corollary 4.2
  • Definition 4.4
  • Proposition 4.5
  • Remark 4.6
  • Proposition 5.1
  • Definition 5.2: Balanced realizations
  • Theorem 5.3
  • ...and 20 more