Table of Contents
Fetching ...

PICASO: Permutation-Invariant Context Composition with State Space Models

Tian Yu Liu, Alessandro Achille, Matthew Trager, Aditya Golatkar, Luca Zancato, Stefano Soatto

TL;DR

PICASO introduces a permutation-invariant, state-space-based approach to efficiently compose multiple retrieved contexts for generation. By storing pre-computed context states and using CASO with permutation-invariant averaging (PICASO-S and PICASO-R), the method achieves near-concatenation performance with significantly reduced online cost. The authors provide polynomial/linear-time algorithms, derive a supporting bound, and show that fine-tuning with BPTC/BP2C closes the remaining performance gap, enabling scalable retrieval-augmented generation on long contexts. This yields practical speedups (≈5.4x) and robust performance across WikiText-V2 and MSMARCO, while maintaining model capabilities on standard LLM tasks. The work highlights a scalable path for integrating large numbers of retrieved contexts without sacrificing efficiency or accuracy.

Abstract

Providing Large Language Models with relevant contextual knowledge at inference time has been shown to greatly improve the quality of their generations. This is often achieved by prepending informative passages of text, or 'contexts', retrieved from external knowledge bases to their input. However, processing additional contexts online incurs significant computation costs that scale with their length. State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states from which to start the generation. A key challenge arises when attempting to leverage information present across multiple contexts, since there is no straightforward way to condition generation on multiple independent states in existing SSMs. To address this, we leverage a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens. Since the temporal ordering of contexts can often be uninformative, we enforce permutation-invariance by efficiently averaging states obtained via our composition algorithm across all possible context orderings. We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.

PICASO: Permutation-Invariant Context Composition with State Space Models

TL;DR

PICASO introduces a permutation-invariant, state-space-based approach to efficiently compose multiple retrieved contexts for generation. By storing pre-computed context states and using CASO with permutation-invariant averaging (PICASO-S and PICASO-R), the method achieves near-concatenation performance with significantly reduced online cost. The authors provide polynomial/linear-time algorithms, derive a supporting bound, and show that fine-tuning with BPTC/BP2C closes the remaining performance gap, enabling scalable retrieval-augmented generation on long contexts. This yields practical speedups (≈5.4x) and robust performance across WikiText-V2 and MSMARCO, while maintaining model capabilities on standard LLM tasks. The work highlights a scalable path for integrating large numbers of retrieved contexts without sacrificing efficiency or accuracy.

Abstract

Providing Large Language Models with relevant contextual knowledge at inference time has been shown to greatly improve the quality of their generations. This is often achieved by prepending informative passages of text, or 'contexts', retrieved from external knowledge bases to their input. However, processing additional contexts online incurs significant computation costs that scale with their length. State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states from which to start the generation. A key challenge arises when attempting to leverage information present across multiple contexts, since there is no straightforward way to condition generation on multiple independent states in existing SSMs. To address this, we leverage a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens. Since the temporal ordering of contexts can often be uninformative, we enforce permutation-invariance by efficiently averaging states obtained via our composition algorithm across all possible context orderings. We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.

Paper Structure

This paper contains 31 sections, 5 theorems, 18 equations, 9 figures, 4 tables, 3 algorithms.

Key Result

Proposition 1

Let $\bm{u}_1,\ldots,\bm{u}_n$ be a collection of input sequences and let $\bm{u}=\bm{u}_1 \cdots \bm{u}_n$ be their concatenation. Then, for a SSM layer that evolves based on equation eq:SSM, we have

Figures (9)

  • Figure 1: (Left:) We propose a “Database of States,” where contexts are stored as pre-processed state vectors. Given a query, relevant states are then retrieved and composed into a single state vector which is used to condition the model’s generation. (Right:) We plot the increase in total time required to generate an additional 64 tokens, when concatenating a 64-token prompt with retrieved contexts. We model the time taken for PICASO-R as the time taken to combine 5 pre-processed context states, which involves only arithmetic operations and notably zero model processing time. As a result, the processing and inference costs for PICASO-R remain constant regardless of the length of retrieved contexts. In contrast, the timings for a Transformer model scale quadratically, and for an SSM linearly, with total length when generating from concatenated context tokens. These timings are measured using the official Mamba benchmarking code, which includes optimizations such as quantization and CUDA graphs for SSMs, and flash attention for Transformers.
  • Figure 2: Left: Naive averaging ("Soup") of context states. Right: Averaging CASO states. CASO states are “closer” to one another (see Proposition) and hence can be more meaningfully interpolated. On the other hand, naively averaging states of independent contexts do not possess this property. Both plots are computed over 10 samples of (query, continuation, retrieved contexts).
  • Figure 3: Zero-shot evaluation of PICASO using Mamba-2 compared to other composition methods on WikiText. While the performance of PICASO lags slightly behind that of concatenation (left), PICASO-R is on average $5.4\times$ faster (right). PICASO-S and PICASO-R perform similarly and yield overlapping curves (hence not visible in the left plot). Incorporating permutation invariance for concatenation via PIConcat-R gives the best results. However, it incurs magnitudes higher computational costs despite being performed within a single batched forward pass, hence we omit from the right plot to prevent it from disrupting the scale of the x-axis and focus comparisons on PICASO.
  • Figure 4: (Left + Middle:) Fine-tuning with BPTC on WikiText brings the performance of PICASO to that of concatenation, while retaining its significant speed advantages. (Right:) Fine-tuning with BP2C on WikiText improves the effectiveness of PICASO as well, but is much faster in terms of training time since it does not require backpropagating through the composed state. Note that fine-tuning has no impact on the actual composition time when used for inference.
  • Figure 5: Timings for different composition algorithms evaluated on WikiText using Mamba-2 2.7B (zero-shot), including that of PIConcat-R. While PIConcat results in the best performance (y-axis), its computational cost (x-axis) is significantly higher than that of other methods. We refer to \ref{['fig:zero-shot-wikitext']} for a more condensed plot to compare the remaining methods.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Proposition 1: CASO
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • proof