Table of Contents
Fetching ...

RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model

Changhai Zhou, Shijie Han, Lining Yang, Yuhua Zhou, Xu Cheng, Yibin Wang, Hongguang Li

TL;DR

RankAdaptor is introduced, a hierarchical rank allocation method that enables efficient fine-tuning of pruned LLMs according to layerwise specific recovery requirements and consistently outperforms state-of-the-art methods across a variety of pruning settings and LLM architectures.

Abstract

The efficient compression of large language models (LLMs) has become increasingly popular. However, recovering the performance of compressed LLMs remains a major challenge. The current practice in LLM compression entails the implementation of structural pruning, complemented by a recovery phase that leverages the Low-Rank Adaptation (LoRA) algorithm. Structural pruning's uneven modification of model architecture, coupled with standard LoRA's fixed configuration allocation across layers in an online pipeline, leads to suboptimal performance in various downstream tasks for pruned models. To address this challenge, we introduce RankAdaptor, a hierarchical rank allocation method that enables efficient fine-tuning of pruned LLMs according to layerwise specific recovery requirements. We employ a performance model that conducts offline meta-learning and online incremental learning to explore optimal rank values for each layer. Comprehensive experiments on popular benchmarks show that RankAdaptor consistently outperforms state-of-the-art methods across a variety of pruning settings and LLM architectures, with improvements ranging from 0.7\% to 5.5\%.

RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model

TL;DR

RankAdaptor is introduced, a hierarchical rank allocation method that enables efficient fine-tuning of pruned LLMs according to layerwise specific recovery requirements and consistently outperforms state-of-the-art methods across a variety of pruning settings and LLM architectures.

Abstract

The efficient compression of large language models (LLMs) has become increasingly popular. However, recovering the performance of compressed LLMs remains a major challenge. The current practice in LLM compression entails the implementation of structural pruning, complemented by a recovery phase that leverages the Low-Rank Adaptation (LoRA) algorithm. Structural pruning's uneven modification of model architecture, coupled with standard LoRA's fixed configuration allocation across layers in an online pipeline, leads to suboptimal performance in various downstream tasks for pruned models. To address this challenge, we introduce RankAdaptor, a hierarchical rank allocation method that enables efficient fine-tuning of pruned LLMs according to layerwise specific recovery requirements. We employ a performance model that conducts offline meta-learning and online incremental learning to explore optimal rank values for each layer. Comprehensive experiments on popular benchmarks show that RankAdaptor consistently outperforms state-of-the-art methods across a variety of pruning settings and LLM architectures, with improvements ranging from 0.7\% to 5.5\%.
Paper Structure (39 sections, 5 equations, 6 figures, 7 tables)

This paper contains 39 sections, 5 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Illustration of the process of pruning and recovery. The baseline approach is detailed in Section \ref{['sec: bg']}, and the proposed method is described in Section \ref{['sec: method']}.
  • Figure 2: Performance of benchmarks for the different fine-tuning configurations. LoRA denotes using fixed ranks for different layers, whereas LoRA$^*$ indicates using different rank configurations. The results are reported in percentage (%).
  • Figure 3: Hierarchical weight matrices decomposition: same rank in LoRA (left) versus hierarchical different ranks in RankAdaptor (right).
  • Figure 4: RankAdaptor Workflow: Through three phases (Initialization-Iteration-Convergence), find the optimal hierarchical rank configuration for recovering pruned LLM.
  • Figure 5: Article continuation task comparison in LLaMA-7B
  • ...and 1 more figures