Table of Contents
Fetching ...

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Yang Li, Shaobo Han, Shihao Ji

TL;DR

This work proposes a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers by sharing parameters globally via a vector bank and achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods.

Abstract

As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-k admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, instruction tuning, and mathematical reasoning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4% of LoRA's stored parameters, yet achieves superior results. Our source code is available at https://github.com/leo-yangli/VB-LoRA. This method has been merged into the Hugging Face PEFT package.

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

TL;DR

This work proposes a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers by sharing parameters globally via a vector bank and achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods.

Abstract

As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-k admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, instruction tuning, and mathematical reasoning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4% of LoRA's stored parameters, yet achieves superior results. Our source code is available at https://github.com/leo-yangli/VB-LoRA. This method has been merged into the Hugging Face PEFT package.
Paper Structure (29 sections, 3 equations, 9 figures, 10 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 9 figures, 10 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of the PEFT methods on RoBERTa-Large. Our VB-LoRA achieves higher scores with significantly smaller number of stored parameters.
  • Figure 2: Left: The model parameters can be represented as a composition of vectors from a vector bank, which is shared across sub-vectors, modules and layers. Right: Architecture of VB-LoRA. We use a top-$k$ softmax function to select $k$ vectors from the vector bank. The selected vectors are then pooled into a sub-vector, which is arranged at a desired position, forming the parameters of LoRA.
  • Figure 3: VB-LoRA's vector selection footprints during training. The x-axis represents the 96 sub-vectors formed by the vectors from a bank of 90 vectors, while the y-axis represents the indices of selected vectors from the bank. The blue blocks indicate the selection footprint during training.
  • Figure 4: The x-axis represents the 192 sub-vectors formed by the vectors in the vector bank, while the y-axis represents the 30 vectors in the vector bank. The vectors initially selected by each sub-vector are shown in red, the vectors finally selected are shown in blue, and the overlapping vectors are shown in green.
  • Figure 5: VB-LoRA’s vector selection footprints during training. The x-axis represents the 96 sub-vectors formed by the vectors from a bank of 90 vectors, while the y-axis represents the indices of selected vectors from the bank. The blue blocks indicate the selection footprint during training.
  • ...and 4 more figures