Table of Contents
Fetching ...

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Tong Chen, Hao Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, Hao Cheng

TL;DR

GenerativeAdapter is introduced, an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning and suggesting that general adaption to a wide range of different contexts should be allowed.

Abstract

Large language models (LMs) are typically adapted to improve performance on new contexts (\eg text prompts that define new tasks or domains) through fine-tuning or prompting. However, there is an accuracy compute tradeoff -- fine-tuning incurs significant training cost and prompting increases inference overhead. We introduce $GenerativeAdapter$, an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning. The adapter generator is trained via self-supervised learning, and can be used to adapt a single frozen LM for any new task simply by mapping the associated task or domain context to a new adapter. We apply $GenerativeAdapter$ to two pretrained LMs (Mistral-7B-Instruct and Llama2-7B-Chat) and evaluate the adapted models in three adaption scenarios: knowledge acquisition from documents, learning from demonstrations, and personalization for users. In StreamingQA, our approach is effective in injecting knowledge into the LM's parameters, achieving a 63.5% improvement in F1 score over the model with supervised fine-tuning (from $19.5$ to $31.5$) for contexts as long as 32K tokens. In the MetaICL in-context learning evaluation, our method achieves an average accuracy of $44.9$ across 26 tasks, outperforming the base model. On MSC, our method proves to be highly competitive in memorizing user information from conversations with a 4x reduction in computation and memory costs compared to prompting with full conversation history. Together, these results suggest that $GenerativeAdapter$ should allow for general adaption to a wide range of different contexts.

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

TL;DR

GenerativeAdapter is introduced, an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning and suggesting that general adaption to a wide range of different contexts should be allowed.

Abstract

Large language models (LMs) are typically adapted to improve performance on new contexts (\eg text prompts that define new tasks or domains) through fine-tuning or prompting. However, there is an accuracy compute tradeoff -- fine-tuning incurs significant training cost and prompting increases inference overhead. We introduce , an effective and efficient adaptation method that directly maps new contexts to low-rank LM adapters, thereby significantly reducing inference overhead with no need for finetuning. The adapter generator is trained via self-supervised learning, and can be used to adapt a single frozen LM for any new task simply by mapping the associated task or domain context to a new adapter. We apply to two pretrained LMs (Mistral-7B-Instruct and Llama2-7B-Chat) and evaluate the adapted models in three adaption scenarios: knowledge acquisition from documents, learning from demonstrations, and personalization for users. In StreamingQA, our approach is effective in injecting knowledge into the LM's parameters, achieving a 63.5% improvement in F1 score over the model with supervised fine-tuning (from to ) for contexts as long as 32K tokens. In the MetaICL in-context learning evaluation, our method achieves an average accuracy of across 26 tasks, outperforming the base model. On MSC, our method proves to be highly competitive in memorizing user information from conversations with a 4x reduction in computation and memory costs compared to prompting with full conversation history. Together, these results suggest that should allow for general adaption to a wide range of different contexts.

Paper Structure

This paper contains 26 sections, 7 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of GenerativeAdapter. Left: During test-time contextualization, the adapters $\Delta_1, \ldots, \Delta_t$ are generated sequentially for the stream of context chunks $C_1, \ldots, C_t$. At a given time step $t$, the context chunk $C_t$ is encoded by the base LM $\Theta_{\rm base}$ into hidden state vectors $\mathbf{H}_t$. Then the generator $\mathcal{G}$ produces a new adapter $\Delta_t$ based on the collection of hidden state vectors $\mathbf{H}_1, \ldots, \mathbf{H}_t$ representing the accumulated context. Right: During inference, we combine the latest adapter $\Delta_t$ with the base LM $\Theta_{\rm base}$ to generate responses for input prompts.
  • Figure 2: Document QA Performance on SQuAD and StreamingQA across varying context lengths. For each point, the QA accuracy (F1 score) is calculated based on the same set of test questions. Both fine-tuning methods (supervised fine-tuning and continuous pretraining) are evaluated in a closed-book manner with constant QA performance across varying context lengths.
  • Figure 3: Computation and storage requirements for GenerativeAdapter and baseline methods on StreamingQA. For GenerativeAdapter, the context is converted into an adaptor during contextualization and then stored for inference. For the prompting method, the key-value (KV) cache can be generated during contextualization and reused during inference.
  • Figure 4: Accuracy plots on MetaICL with varying K-shot in-context examples. Both fine-tuned and zero-shot prompting baselines are instructed to complete the task without any in-context examples.
  • Figure 5: In-context learning evaluation of GenerativeAdapter, based on Llama2-7B-Chat, across 26 test datasets from MetaICL.
  • ...and 2 more figures