Studying the Soupability of Documents in State Space Models
Yasaman Jafari, Zixian Wang, Leon Bergen, Taylor Berg-Kirkpatrick
TL;DR
This work investigates whether independently encoded document representations produced by Structured State Space Models can be merged post hoc to support multi-document reasoning. By pooling per-document hidden states with simple operators (primarily averaging) and conditioning a decoder on the resulting souped state, the authors demonstrate competitive or superior performance to monolithic encoding on multi-hop QA, long-document reading, and sparse retrieval tasks after appropriate finetuning. Key findings show that encoder–decoder finetuning is essential for soupability, averaging is the most robust pooling method, and the approach scales to large corpora with substantial inference-time caching benefits. The results highlight a modular, cache-friendly workflow for large-scale corpus reasoning that is particularly well-suited to SSMs and offers practical advantages over traditional concatenation in dynamic retrieval scenarios.
Abstract
We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post hoc to support downstream reasoning. Inspired by model souping, we study document souping, a strategy where documents are encoded independently, and their representations are pooled, via simple operations like averaging, into a single context state. This approach enables modular encoding and reuse without reprocessing the full input for each query. We demonstrate that finetuned Mamba2 models with souped representations achieve competitive or superior performance across multi-hop QA, sparse retrieval, and long-document reasoning tasks compared to the standard monolithic encoding approach. For example, on the RACE and QuALITY benchmarks for long document question answering, this method substantially outperforms a traditional concatenation approach. Crucially, this modular design scales to hundreds of documents while delivering substantial savings in inference cost, unlocking new possibilities for large-scale corpus reasoning.
