Table of Contents
Fetching ...

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

Abdessalam Ed-dib, Zhanibek Datbayev, Amine Mohamed Aboussalah

TL;DR

The paper addresses the high computational cost of fine-tuning large language models by proposing GeLoRA, a geometry-aware, parameter-efficient fine-tuning method that adapts LoRA ranks per transformer layer based on the intrinsic dimensionality of hidden representations. It provides a theoretical framework linking layer representation idim to the necessary LoRA capacity, and uses the TwoNN estimator to compute per-layer dimensions, yielding a lower bound on the required update rank. Empirically, GeLoRA achieves state-of-the-art or competitive results on GLUE and SQuAD under tight parameter budgets, outperforming existing LoRA variants and adapters while preserving efficiency. The approach offers both practical improvements for model personalization and theoretical insight into why intermediate-task tuning can be effective in certain regimes.

Abstract

Fine-tuning large language models (LLMs) is computationally intensive because it requires updating all parameters. Low-Rank Adaptation (LoRA) improves efficiency by modifying only a subset of weights but introduces a trade-off between expressivity and computational cost: lower ranks reduce resources but limit expressiveness, while higher ranks enhance expressivity at increased cost. Despite recent advances in adaptive LoRA techniques, existing methods fail to provide a theoretical basis for optimizing the trade-off between model performance and efficiency. We propose Geometric Low-Rank Adaptation (GeLoRA), a novel framework that computes the intrinsic dimensionality of hidden state representations to adaptively select LoRA ranks. We demonstrate that the intrinsic dimension provides a lower bound for the optimal rank of LoRA matrices, allowing for a principled selection that balances efficiency and expressivity. GeLoRA dynamically adjusts the rank for each layer based on the intrinsic dimensionality of its input and output representations, recognizing that not all model parameters equally impact fine-tuning. Empirical validation on multiple tasks shows that GeLoRA consistently outperforms recent baselines within the same parameter budget.

GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning

TL;DR

The paper addresses the high computational cost of fine-tuning large language models by proposing GeLoRA, a geometry-aware, parameter-efficient fine-tuning method that adapts LoRA ranks per transformer layer based on the intrinsic dimensionality of hidden representations. It provides a theoretical framework linking layer representation idim to the necessary LoRA capacity, and uses the TwoNN estimator to compute per-layer dimensions, yielding a lower bound on the required update rank. Empirically, GeLoRA achieves state-of-the-art or competitive results on GLUE and SQuAD under tight parameter budgets, outperforming existing LoRA variants and adapters while preserving efficiency. The approach offers both practical improvements for model personalization and theoretical insight into why intermediate-task tuning can be effective in certain regimes.

Abstract

Fine-tuning large language models (LLMs) is computationally intensive because it requires updating all parameters. Low-Rank Adaptation (LoRA) improves efficiency by modifying only a subset of weights but introduces a trade-off between expressivity and computational cost: lower ranks reduce resources but limit expressiveness, while higher ranks enhance expressivity at increased cost. Despite recent advances in adaptive LoRA techniques, existing methods fail to provide a theoretical basis for optimizing the trade-off between model performance and efficiency. We propose Geometric Low-Rank Adaptation (GeLoRA), a novel framework that computes the intrinsic dimensionality of hidden state representations to adaptively select LoRA ranks. We demonstrate that the intrinsic dimension provides a lower bound for the optimal rank of LoRA matrices, allowing for a principled selection that balances efficiency and expressivity. GeLoRA dynamically adjusts the rank for each layer based on the intrinsic dimensionality of its input and output representations, recognizing that not all model parameters equally impact fine-tuning. Empirical validation on multiple tasks shows that GeLoRA consistently outperforms recent baselines within the same parameter budget.

Paper Structure

This paper contains 36 sections, 6 theorems, 27 equations, 10 figures, 13 tables, 1 algorithm.

Key Result

Theorem 3.1

The intrinsic dimension $\hat{\text{idim}}(\phi)$ is a lower bound to the local dimensionality $d(\phi)$.

Figures (10)

  • Figure 1: Assume that locally around $\Theta^{(0)}$, the loss function can be approximated by $\mathcal{L}(\theta_1, \theta_2) = \frac{1}{2} \theta_1^2$. In this scenario, the loss landscape exhibits a single free direction. The loss depends exclusively on $\theta_1$, while $\theta_2$ has no influence on it. As a result, changing $\theta_2$ alone does not affect the loss, making $\theta_2$ a free direction in the landscape. In contrast, variations in $\theta_1$ lead to changes in the loss, meaning that the zero-loss set forms a line along the $\theta_2$-axis. Therefore, the local dimensionality of the low-loss region is $1$.
  • Figure 2: Schematic of the GeLoRA methodology. The process includes intrinsic dimension analysis (Step 1), setting minimal LoRA ranks based on these dimensions (Step 2), and performing efficient fine-tuning to achieve an optimal balance between computational efficiency and model expressivity (Step 3).
  • Figure 3: Intrinsic dimension profiles of RTE and STS-B datasets using DebertaV3 before and after intermediate task tuning using MRPC.
  • Figure 4: A helical curve in 3D space with an intrinsic dimension of 1, fully described by a single parameter despite its 3D embedding.
  • Figure 5: GeLoRA rank pattern for CoLA
  • ...and 5 more figures

Theorems & Definitions (16)

  • Definition 3.1: Local Dimensionality
  • Theorem 3.1: Intrinsic Dimension as a Lower Bound
  • Theorem 3.2: Rank Bound of Transformer Blocks
  • Corollary 3.2.1: Bound on Parameters for Transformer Block Optimization
  • Conjecture 3.1: Transformer Rank Bound Dynamics
  • Definition A.1: Intrinsic Dimensionality
  • Definition A.2: Single-head Self-attention Layer
  • Definition A.3: Multi-head Self Attention Layer
  • Theorem B.1: Intrinsic Dimension as a Lower Bound
  • proof
  • ...and 6 more