Table of Contents
Fetching ...

3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability

Baohao Liao, Christof Monz

TL;DR

A novel method is introduced, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges and enhances LLM's interpretability through integration within a framework of distributed interchange intervention.

Abstract

Parameter-efficient finetuning (PEFT) methods effectively adapt large language models (LLMs) to diverse downstream tasks, reducing storage and GPU memory demands. Despite these advantages, several applications pose new challenges to PEFT beyond mere parameter efficiency. One notable challenge involves the efficient deployment of LLMs equipped with multiple task- or user-specific adapters, particularly when different adapters are needed for distinct requests within the same batch. Another challenge is the interpretability of LLMs, which is crucial for understanding how LLMs function. Previous studies introduced various approaches to address different challenges. In this paper, we introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges: (1) RoAd is remarkably parameter-efficient, delivering optimal performance on GLUE, eight commonsense reasoning tasks and four arithmetic reasoning tasks with $<0.1\%$ trainable parameters; (2) RoAd facilitates the efficient serving of requests requiring different adapters within a batch, with an overhead comparable to element-wise multiplication instead of batch matrix multiplication; (3) RoAd enhances LLM's interpretability through integration within a framework of distributed interchange intervention, demonstrated via composition experiments.

3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability

TL;DR

A novel method is introduced, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges and enhances LLM's interpretability through integration within a framework of distributed interchange intervention.

Abstract

Parameter-efficient finetuning (PEFT) methods effectively adapt large language models (LLMs) to diverse downstream tasks, reducing storage and GPU memory demands. Despite these advantages, several applications pose new challenges to PEFT beyond mere parameter efficiency. One notable challenge involves the efficient deployment of LLMs equipped with multiple task- or user-specific adapters, particularly when different adapters are needed for distinct requests within the same batch. Another challenge is the interpretability of LLMs, which is crucial for understanding how LLMs function. Previous studies introduced various approaches to address different challenges. In this paper, we introduce a novel method, RoAd, which employs a straightforward 2D rotation to adapt LLMs and addresses all the above challenges: (1) RoAd is remarkably parameter-efficient, delivering optimal performance on GLUE, eight commonsense reasoning tasks and four arithmetic reasoning tasks with trainable parameters; (2) RoAd facilitates the efficient serving of requests requiring different adapters within a batch, with an overhead comparable to element-wise multiplication instead of batch matrix multiplication; (3) RoAd enhances LLM's interpretability through integration within a framework of distributed interchange intervention, demonstrated via composition experiments.
Paper Structure (23 sections, 4 equations, 9 figures, 15 tables)

This paper contains 23 sections, 4 equations, 9 figures, 15 tables.

Figures (9)

  • Figure 1: Performance of various PEFT methods on the GLUE benchmark, eight commonsense reasoning tasks and four arithmetic reasoning tasks with RoBERTa-large or LLaMA-13B.
  • Figure 2: Pilot study for the pretrained and finetuned representations. Left & Middle: The change in magnitude and angle of representations between pretrained and finetuned LLM using full finetuning or LoRA. Right: The disentanglement experiment of magnitude and angle of pretrained representation.
  • Figure 3: Overview of RoAd$_1$.
  • Figure 4: Comparison of throughput between LoRA and RoAd. Left: The influence of weight merging for LoRA. Middle: The influence of the number of generated tokens. Right: The influence of the number of heterogeneous requests in a batch.
  • Figure 5: Qualitative comparison between RoAd and LoReFT for their composability. The prompt for different subspaces is always in English. Refer to Figure \ref{['fig: composition all 1']}, \ref{['fig: composition all 2']} and \ref{['fig: composition all 3']} for more examples.
  • ...and 4 more figures