Table of Contents
Fetching ...

IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring

Xuan Cui, Huiyue Li, Run Zeng, Yunfei Zhao, Jinrui Qian, Wei Duan, Bo Liu, Zhanpeng Zhou

Abstract

As large language models (LLMs) scale to billions of parameters, full-parameter fine-tuning becomes compute- and memory-prohibitive. Parameter-efficient fine-tuning (PEFT) mitigates this issue by updating only a small set of task-specific parameters while keeping the base model frozen. Among PEFT approaches, low-rank adaptation (LoRA) is widely adopted; however, it enforces a uniform rank across layers despite substantial variation in layer importance, motivating {layerwise} rank allocation. Recent adaptive-rank variants (e.g., AdaLoRA) allocate ranks based on importance scores, yet typically rely on instantaneous gradients that capture only local sensitivity, overlooking non-local, pathwise effects within the same layer, which yields unstable and biased scores. To address this limitation, we introduce IGU-LoRA, an adaptive-rank LoRA that (i) computes within-layer Integrated Gradients (IG) sensitivities and aggregates them into a layer-level score for rank allocation, and (ii) applies an uncertainty-aware scheme using exponential moving averages with deviation tracking to suppress noisy updates and calibrate rank selection. Theoretically, we prove an upper bound on the composite trapezoidal rule approximation error for parameter-space IG under a pathwise Hessian-Lipschitz condition, which informs the quadrature budget. Across diverse tasks and architectures, IGU-LoRA consistently outperforms strong PEFT baselines at matched parameter budgets, improving downstream accuracy and robustness. Ablations confirm the contributions of pathwise within-layer sensitivity estimates and uncertainty-aware selection to effective rank allocation. Our code is publicly available at https://github.com/withyou12/igulora.git

IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring

Abstract

As large language models (LLMs) scale to billions of parameters, full-parameter fine-tuning becomes compute- and memory-prohibitive. Parameter-efficient fine-tuning (PEFT) mitigates this issue by updating only a small set of task-specific parameters while keeping the base model frozen. Among PEFT approaches, low-rank adaptation (LoRA) is widely adopted; however, it enforces a uniform rank across layers despite substantial variation in layer importance, motivating {layerwise} rank allocation. Recent adaptive-rank variants (e.g., AdaLoRA) allocate ranks based on importance scores, yet typically rely on instantaneous gradients that capture only local sensitivity, overlooking non-local, pathwise effects within the same layer, which yields unstable and biased scores. To address this limitation, we introduce IGU-LoRA, an adaptive-rank LoRA that (i) computes within-layer Integrated Gradients (IG) sensitivities and aggregates them into a layer-level score for rank allocation, and (ii) applies an uncertainty-aware scheme using exponential moving averages with deviation tracking to suppress noisy updates and calibrate rank selection. Theoretically, we prove an upper bound on the composite trapezoidal rule approximation error for parameter-space IG under a pathwise Hessian-Lipschitz condition, which informs the quadrature budget. Across diverse tasks and architectures, IGU-LoRA consistently outperforms strong PEFT baselines at matched parameter budgets, improving downstream accuracy and robustness. Ablations confirm the contributions of pathwise within-layer sensitivity estimates and uncertainty-aware selection to effective rank allocation. Our code is publicly available at https://github.com/withyou12/igulora.git
Paper Structure (44 sections, 2 theorems, 46 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 44 sections, 2 theorems, 46 equations, 7 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

Let $s_e(w_{ij})$ be the importance score based on Integrated Gradients (IG) as defined in Eq. (IG_Label), and let $s_{agg}(w_{ij})$ be the epoch-level estimator as defined in Eq. (importanceScoreInEpoch). Define $g_{ij}(\alpha) = \frac{\partial \mathcal{L}(\alpha \Delta \mathbf{W})}{\partial w_{ij} Then, for any $N, M \geq 1$ and $\delta \in (0,1)$, with probability at least $1 - \delta$, the fol

Figures (7)

  • Figure 1: Comparison of frameworks: left to right—(a) LoRA, (b) AdaLoRA, (c) the proposed IGU-LoRA. IGU-LoRA builds on LoRA and AdaLoRA, introducing integrated gradients (IG) to compute parameter importance scores. Please zoom in 300% for better clarity.
  • Figure 2: Comparison of parameter importance scoring methods. (a) The simple gradient method fails in saturated regions, assigning near-zero importance. (b) Integrated gradients compute importance by integrating along the path from initial to final parameter values, capturing the actual total contribution.
  • Figure 3: The impact of different hyperparameters $M, N, \beta_1, \beta_2$ on performance when fine-tuning the Qwen2.5-0.5B model on the Boolq dataset. Please zoom in 300% for better clarity.
  • Figure 4: The impact of different hyperparameters $M, N, \beta_1, \beta_2$ on performance when fine-tuning the Qwen2.5-0.5B model on the GSM8K dataset. Please zoom in 300% for better clarity.
  • Figure 5: Rank allocation by IGU-LoRA on the Qwen-2.5-0.5B backbone after fine-tuning for the BoolQ and GSM8K tasks. Please zoom in 300% for better clarity.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof