PICASO: Permutation-Invariant Context Composition with State Space Models
Tian Yu Liu, Alessandro Achille, Matthew Trager, Aditya Golatkar, Luca Zancato, Stefano Soatto
TL;DR
PICASO introduces a permutation-invariant, state-space-based approach to efficiently compose multiple retrieved contexts for generation. By storing pre-computed context states and using CASO with permutation-invariant averaging (PICASO-S and PICASO-R), the method achieves near-concatenation performance with significantly reduced online cost. The authors provide polynomial/linear-time algorithms, derive a supporting bound, and show that fine-tuning with BPTC/BP2C closes the remaining performance gap, enabling scalable retrieval-augmented generation on long contexts. This yields practical speedups (≈5.4x) and robust performance across WikiText-V2 and MSMARCO, while maintaining model capabilities on standard LLM tasks. The work highlights a scalable path for integrating large numbers of retrieved contexts without sacrificing efficiency or accuracy.
Abstract
Providing Large Language Models with relevant contextual knowledge at inference time has been shown to greatly improve the quality of their generations. This is often achieved by prepending informative passages of text, or 'contexts', retrieved from external knowledge bases to their input. However, processing additional contexts online incurs significant computation costs that scale with their length. State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states from which to start the generation. A key challenge arises when attempting to leverage information present across multiple contexts, since there is no straightforward way to condition generation on multiple independent states in existing SSMs. To address this, we leverage a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens. Since the temporal ordering of contexts can often be uninformative, we enforce permutation-invariance by efficiently averaging states obtained via our composition algorithm across all possible context orderings. We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.
