Table of Contents
Fetching ...

Parameter-Efficient Subspace Optimization for LLM Fine-Tuning

Yuchen Lou, Zeqi Ye, Minshuo Chen

TL;DR

This work reframes parameter-efficient fine-tuning of large language models as a subspace minimization problem, introducing PESO to unify LoRA-like methods with principled optimization. It couples exploration of new subspaces guided by full gradients with exploitation inside current subspaces via SVD-based representations, enabling stronger convergence guarantees in the full-parameter space. The paper presents PESO-LoRA-R and PESO-LoRA-T as practical instantiations, achieving improved performance on GLUE, reasoning and code tasks while maintaining memory efficiency. Theoretical results show convergence to stationary points under full gradient restart, with exact convergence possible when subspaces align with full gradients, and empirical evidence across NLP benchmarks supports the method's effectiveness and robustness.

Abstract

This paper develops a new perspective on parameter-efficient fine-tuning for LLMs, inspired by the classical theory of subspace minimization. We introduce a unifying framework, Parameter-Efficient Subspace Optimization (PESO), which not only recovers many existing methods such as LoRA but also bridges them with the principled algorithmic and theoretical foundations of subspace optimization. This connection highlights a natural ``exploration--exploitation'' view of subspace methods, guiding the design of new algorithms that achieve strong convergence performance while still preserving memory efficiency. Importantly, our framework establishes the convergence in the full-parameter space, resolving a critical gap of LoRA variants where low-rank updates lack such guarantees. We further instantiate the framework into a practical algorithm named {PESO-LoRA}, based on LoRA-type parameterization. Our algorithm achieves notable improvements over existing methods on standard benchmarks.

Parameter-Efficient Subspace Optimization for LLM Fine-Tuning

TL;DR

This work reframes parameter-efficient fine-tuning of large language models as a subspace minimization problem, introducing PESO to unify LoRA-like methods with principled optimization. It couples exploration of new subspaces guided by full gradients with exploitation inside current subspaces via SVD-based representations, enabling stronger convergence guarantees in the full-parameter space. The paper presents PESO-LoRA-R and PESO-LoRA-T as practical instantiations, achieving improved performance on GLUE, reasoning and code tasks while maintaining memory efficiency. Theoretical results show convergence to stationary points under full gradient restart, with exact convergence possible when subspaces align with full gradients, and empirical evidence across NLP benchmarks supports the method's effectiveness and robustness.

Abstract

This paper develops a new perspective on parameter-efficient fine-tuning for LLMs, inspired by the classical theory of subspace minimization. We introduce a unifying framework, Parameter-Efficient Subspace Optimization (PESO), which not only recovers many existing methods such as LoRA but also bridges them with the principled algorithmic and theoretical foundations of subspace optimization. This connection highlights a natural ``exploration--exploitation'' view of subspace methods, guiding the design of new algorithms that achieve strong convergence performance while still preserving memory efficiency. Importantly, our framework establishes the convergence in the full-parameter space, resolving a critical gap of LoRA variants where low-rank updates lack such guarantees. We further instantiate the framework into a practical algorithm named {PESO-LoRA}, based on LoRA-type parameterization. Our algorithm achieves notable improvements over existing methods on standard benchmarks.

Paper Structure

This paper contains 41 sections, 5 theorems, 51 equations, 1 figure, 11 tables, 6 algorithms.

Key Result

Theorem 5.1

Suppose all assumptions hold. With full gradient restart, the iterates $\{W_k\}$ generated by Algorithm alg:generic_subspace satisfy $\liminf_{k\to\infty} \mathbb{E}[\|G_k\|] \leq \lim_{k\to\infty} \delta_k$.

Figures (1)

  • Figure 1: Comparison of full-parameter tuning, LoRA, and our method (PESO-LoRA). Left: MetaMathQA. Middle: synthetic example $\min_{W}\|W-M\|_F^2$ with $M = 10 \cdot \operatorname{diag}(1,\dots,1,0,\dots,0)$ ($r{+}1$ ones); see Appendix \ref{['sec:synthetic']}. Right: optimization trajectories. PESO-LoRA bridges the loss gap of LoRA while preserving memory and computation efficiency.

Theorems & Definitions (11)

  • Theorem 5.1
  • Remark 1
  • Remark 2
  • Proposition G.1
  • Theorem G.2
  • proof
  • proof : Proof of Theorem \ref{['thm:conv_stochastic']}
  • Lemma G.3
  • proof
  • Theorem G.4
  • ...and 1 more