Table of Contents
Fetching ...

StreamAdapter: Efficient Test Time Adaptation from Contextual Streams

Dilxat Muhtar, Yelong Shen, Yaming Yang, Xiaodong Liu, Yadong Lu, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Xueliang Zhang, Jianfeng Gao, Weizhu Chen, Qi Zhang

TL;DR

StreamAdapter is proposed, a novel approach that directly updates model parameters from context at test time, eliminating the need for explicit in-context demonstrations and significantly reduce inference costs and allows for efficient inference with constant time complexity, regardless of demonstration count.

Abstract

In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks directly from the given demonstrations without requiring gradient updates. While recent advances have expanded context windows to accommodate more demonstrations, this approach increases inference costs without necessarily improving performance. To mitigate these issues, We propose StreamAdapter, a novel approach that directly updates model parameters from context at test time, eliminating the need for explicit in-context demonstrations. StreamAdapter employs context mapping and weight absorption mechanisms to dynamically transform ICL demonstrations into parameter updates with minimal additional parameters. By reducing reliance on numerous in-context examples, StreamAdapter significantly reduce inference costs and allows for efficient inference with constant time complexity, regardless of demonstration count. Extensive experiments across diverse tasks and model architectures demonstrate that StreamAdapter achieves comparable or superior adaptation capability to ICL while requiring significantly fewer demonstrations. The superior task adaptation and context encoding capabilities of StreamAdapter on both language understanding and generation tasks provides a new perspective for adapting LLMs at test time using context, allowing for more efficient adaptation across scenarios and more cost-effective inference

StreamAdapter: Efficient Test Time Adaptation from Contextual Streams

TL;DR

StreamAdapter is proposed, a novel approach that directly updates model parameters from context at test time, eliminating the need for explicit in-context demonstrations and significantly reduce inference costs and allows for efficient inference with constant time complexity, regardless of demonstration count.

Abstract

In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks directly from the given demonstrations without requiring gradient updates. While recent advances have expanded context windows to accommodate more demonstrations, this approach increases inference costs without necessarily improving performance. To mitigate these issues, We propose StreamAdapter, a novel approach that directly updates model parameters from context at test time, eliminating the need for explicit in-context demonstrations. StreamAdapter employs context mapping and weight absorption mechanisms to dynamically transform ICL demonstrations into parameter updates with minimal additional parameters. By reducing reliance on numerous in-context examples, StreamAdapter significantly reduce inference costs and allows for efficient inference with constant time complexity, regardless of demonstration count. Extensive experiments across diverse tasks and model architectures demonstrate that StreamAdapter achieves comparable or superior adaptation capability to ICL while requiring significantly fewer demonstrations. The superior task adaptation and context encoding capabilities of StreamAdapter on both language understanding and generation tasks provides a new perspective for adapting LLMs at test time using context, allowing for more efficient adaptation across scenarios and more cost-effective inference

Paper Structure

This paper contains 47 sections, 8 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Overall structure of StreamAdapter. StreamAdapter maps the KV cache into a context state using intra-chunk cross-attention and inter-chunk recurrence, then connects two low-rank matrices through the context state to update the model parameters for absorbing context information into model weights
  • Figure 2: Training strategy of StreamAdapter. The sliding-window strategy accumulates loss from each step in a sequence and updates StreamAdapter's parameters after the entire sequence has been processed. The in-context training employs a 2-forward-1-backward strategy: the first forward pass computes the KV cache without gradient computation, while the second forward pass updates the model parameters using the KV cache from the first forward pass and calculates the loss to update the parameters introduced by StreamAdapter
  • Figure 3: Comparison of various methods across different tasks with different numbers of demonstrations
  • Figure 4: Perplexity gap between TTA methods and sliding window strategy across varying maximum context lengths on the PG19 test set
  • Figure 5: Generation latency and peak memory consumption across different prefill lengths. $\dagger$ indicates adaptation using sequential chunk-wise strategy, as directly mapping all prefill context leads to out-of-memory
  • ...and 4 more figures