Table of Contents
Fetching ...

PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection

Junseo Hwang, Wonguk Cho, Taesup Kim

TL;DR

PiCa introduces Parameter-efficient Fine-tuning with Column Space Projection, a theoretically grounded PEFT method that projects gradient updates onto the principal column space spanned by the top-$r$ left singular vectors $U_r$ of the pre-trained weight $W_0$, and augments this with a weight-sharing scheme. The authors prove that the optimal rank-$r$ component of the update $\\Delta W$ lies in this subspace and develop a sequential projection scheme to bound accumulated updates. Empirically, PiCa outperforms state-of-the-art baselines across NLP tasks (mathematical and commonsense reasoning, NLU) and vision tasks (VTAB-1K, DreamBooth) with substantially fewer trainable parameters, demonstrating both theoretical soundness and practical impact for scalable deployment.

Abstract

Fine-tuning large foundation models is essential for building expert models tailored to specialized tasks and domains, but fully updating billions of parameters is computationally prohibitive. Reducing the number of trainable parameters using parameter-efficient fine-tuning is therefore crucial not only to reduce training costs but also to mitigate storage, caching, and serving overheads during deployment. Prior works, such as Singular Vectors-guided Fine-Tuning, have shown that exploiting the geometry of pre-trained weights can significantly improve parameter-efficiency, but they lack a solid theoretical foundation. In this paper, we introduce Parameter-efficient Fine-tuning with Column Space Projection (PiCa), a novel theoretically grounded PEFT method. We prove that projecting gradients onto the principal column space of pre-trained weights provides an effective inductive bias for adaptation and further enhance parameter efficiency through a novel weight-sharing strategy. Across diverse NLP and vision tasks, PiCa consistently outperforms state-of-the-art baselines under comparable or smaller parameter budgets, demonstrating both theoretical rigor and practical effectiveness.

PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection

TL;DR

PiCa introduces Parameter-efficient Fine-tuning with Column Space Projection, a theoretically grounded PEFT method that projects gradient updates onto the principal column space spanned by the top- left singular vectors of the pre-trained weight , and augments this with a weight-sharing scheme. The authors prove that the optimal rank- component of the update lies in this subspace and develop a sequential projection scheme to bound accumulated updates. Empirically, PiCa outperforms state-of-the-art baselines across NLP tasks (mathematical and commonsense reasoning, NLU) and vision tasks (VTAB-1K, DreamBooth) with substantially fewer trainable parameters, demonstrating both theoretical soundness and practical impact for scalable deployment.

Abstract

Fine-tuning large foundation models is essential for building expert models tailored to specialized tasks and domains, but fully updating billions of parameters is computationally prohibitive. Reducing the number of trainable parameters using parameter-efficient fine-tuning is therefore crucial not only to reduce training costs but also to mitigate storage, caching, and serving overheads during deployment. Prior works, such as Singular Vectors-guided Fine-Tuning, have shown that exploiting the geometry of pre-trained weights can significantly improve parameter-efficiency, but they lack a solid theoretical foundation. In this paper, we introduce Parameter-efficient Fine-tuning with Column Space Projection (PiCa), a novel theoretically grounded PEFT method. We prove that projecting gradients onto the principal column space of pre-trained weights provides an effective inductive bias for adaptation and further enhance parameter efficiency through a novel weight-sharing strategy. Across diverse NLP and vision tasks, PiCa consistently outperforms state-of-the-art baselines under comparable or smaller parameter budgets, demonstrating both theoretical rigor and practical effectiveness.

Paper Structure

This paper contains 34 sections, 8 theorems, 74 equations, 3 figures, 10 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $W_0,W^*\in\mathbb{R}^{m\times n}$ with $W^*=W_0+\Delta W$. Let $U_r,U_r^*$ denote the top-$r$ left singular-vector matrices of $W_0$ and $W^*$. Define the gap Then for any unitarily invariant norm,

Figures (3)

  • Figure 1: Average accuracy as a function of the number of trainable parameters on Commonsense Reasoning datasets using Gemma-2B. PiCa demonstrates superior performance compared to baseline methods with similar parameter budgets.
  • Figure 2: Distribution of perturbations $E_{ij}^P$ and $E_{ij}^Q$ across all weight matrix elements using DeBERTaV3base. Most values are tightly concentrated around zero, validating that $\mathcal{O}(\epsilon)$ is negligible in practice.
  • Figure 3: Ablation study of weight sharing across different datasets and rank configurations.

Theorems & Definitions (11)

  • Lemma 3.1: wedin1972perturbation
  • Theorem 1: Approximation error of projection onto $U_r$
  • Definition 1: L-smoothness for matrix-valued functions
  • Theorem 2: Sequential projection approximates accumulated projection
  • Lemma A.1: Weyl’s Inequality weyl1912asymptotische
  • Lemma A.2: Invariance of Frobenius Norm
  • Lemma A.3: Orthogonal projection is non-expansive in Frobenius norm
  • Theorem 1: Approximation error of projection onto $U_r$
  • proof
  • Theorem 2: Sequential projection approximates accumulated projection
  • ...and 1 more