PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

Xin Yu; Cong Xie; Ziyu Zhao; Tiantian Fan; Lingzhou Xue; Zhi Zhang

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

Xin Yu, Cong Xie, Ziyu Zhao, Tiantian Fan, Lingzhou Xue, Zhi Zhang

TL;DR

This work tackles the limitation of LoRA in representing full fine-tuning capacity by starting from an over-parameterized low-rank space and applying gradient-based structured pruning to end with a compact, expressive adapter. It introduces PrunedLoRA, which jointly prunes LoRA submatrices $A$ and $B$ via a Hessian-informed mask and second-order updates, enabling adaptive rank reduction during fine-tuning. The authors provide a theoretical analysis showing gradient-based pruning is more robust to weight perturbations than activation-based pruning in a toy self-attention setting and demonstrate that this approach yields consistent empirical gains across mathematical reasoning and natural language understanding tasks, across sparsity levels from $50\%$ to $93\%$. Overall, PrunedLoRA narrows the gap between LoRA and full fine-tuning while maintaining inference efficiency, with stronger performance when initialized with higher ranks and pruned gradually.

Abstract

Low-rank adaptation (LoRA) has become a widely used paradigm for parameter-efficient fine-tuning of large language models, yet its representational capacity often lags behind full fine-tuning. Within the context of LoRA, a key open question is how to obtain expressive low-rank adapters from over-parameterized spaces. We propose \textit{PrunedLoRA}, a new framework that leverages structured pruning to obtain highly representative low-rank adapters from an over-parameterized initialization. Unlike prior approaches that impose a fixed low-rank budget, PrunedLoRA dynamically prunes less important components during fine-tuning and prevents their reactivation, enabling flexible and adaptive rank allocation. For structured pruning, by minimizing the pruning error for overall loss, we provide fine-grained pruning and recovery updates in a gradient-based pruning strategy with grounded interpretation. We provide the first theoretical analysis of the robustness of structured pruning and provably show that under the impact of weight perturbation, gradient-based pruning is more robust than activation-based pruning with respect to overall loss. Empirically, PrunedLoRA consistently outperforms LoRA and its variants across supervised fine-tuning tasks in mathematical reasoning, code generation, and natural language understanding, and it also demonstrates advantages over existing structured pruning methods across diverse sparsity levels.

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

TL;DR

Abstract

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (3)