Table of Contents
Fetching ...

Studying the Soupability of Documents in State Space Models

Yasaman Jafari, Zixian Wang, Leon Bergen, Taylor Berg-Kirkpatrick

TL;DR

This work investigates whether independently encoded document representations produced by Structured State Space Models can be merged post hoc to support multi-document reasoning. By pooling per-document hidden states with simple operators (primarily averaging) and conditioning a decoder on the resulting souped state, the authors demonstrate competitive or superior performance to monolithic encoding on multi-hop QA, long-document reading, and sparse retrieval tasks after appropriate finetuning. Key findings show that encoder–decoder finetuning is essential for soupability, averaging is the most robust pooling method, and the approach scales to large corpora with substantial inference-time caching benefits. The results highlight a modular, cache-friendly workflow for large-scale corpus reasoning that is particularly well-suited to SSMs and offers practical advantages over traditional concatenation in dynamic retrieval scenarios.

Abstract

We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post hoc to support downstream reasoning. Inspired by model souping, we study document souping, a strategy where documents are encoded independently, and their representations are pooled, via simple operations like averaging, into a single context state. This approach enables modular encoding and reuse without reprocessing the full input for each query. We demonstrate that finetuned Mamba2 models with souped representations achieve competitive or superior performance across multi-hop QA, sparse retrieval, and long-document reasoning tasks compared to the standard monolithic encoding approach. For example, on the RACE and QuALITY benchmarks for long document question answering, this method substantially outperforms a traditional concatenation approach. Crucially, this modular design scales to hundreds of documents while delivering substantial savings in inference cost, unlocking new possibilities for large-scale corpus reasoning.

Studying the Soupability of Documents in State Space Models

TL;DR

This work investigates whether independently encoded document representations produced by Structured State Space Models can be merged post hoc to support multi-document reasoning. By pooling per-document hidden states with simple operators (primarily averaging) and conditioning a decoder on the resulting souped state, the authors demonstrate competitive or superior performance to monolithic encoding on multi-hop QA, long-document reading, and sparse retrieval tasks after appropriate finetuning. Key findings show that encoder–decoder finetuning is essential for soupability, averaging is the most robust pooling method, and the approach scales to large corpora with substantial inference-time caching benefits. The results highlight a modular, cache-friendly workflow for large-scale corpus reasoning that is particularly well-suited to SSMs and offers practical advantages over traditional concatenation in dynamic retrieval scenarios.

Abstract

We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post hoc to support downstream reasoning. Inspired by model souping, we study document souping, a strategy where documents are encoded independently, and their representations are pooled, via simple operations like averaging, into a single context state. This approach enables modular encoding and reuse without reprocessing the full input for each query. We demonstrate that finetuned Mamba2 models with souped representations achieve competitive or superior performance across multi-hop QA, sparse retrieval, and long-document reasoning tasks compared to the standard monolithic encoding approach. For example, on the RACE and QuALITY benchmarks for long document question answering, this method substantially outperforms a traditional concatenation approach. Crucially, this modular design scales to hundreds of documents while delivering substantial savings in inference cost, unlocking new possibilities for large-scale corpus reasoning.

Paper Structure

This paper contains 50 sections, 3 equations, 3 figures, 12 tables, 2 algorithms.

Figures (3)

  • Figure 1: Computation Graphs for Corpus Encoding.Top: In traditional concatenation-based encoding, all documents $\{d_1, \dots, d_k\}$, the query $q$, and answer $a$ are flattened into a single input sequence and processed end-to-end by an SSM. This requires joint re-encoding for every change to the input. Bottom: In the state-souping approach we study, each document $d_i$ is encoded independently by a shared SSM, producing per document hidden states $\{h_1, \dots, h_k\}$ which are pooled into a single representation $h_{\text{soup}}$ (e.g., via sum or average). This pooled state is then used, alongside the query $q$, to drive downstream prediction. The design supports parallel encoding, modular reuse, and post hoc corpus composition.
  • Figure 2: Exact Match (EM) scores on HotpotQA for Mamba2-8B evaluated across increasing numbers of input documents. Each line represents a model trained on 5 documents (2 gold + 3 distractors) with a different pooling and normalization configuration. Soup w/ Average consistently remains robust across all tested document sizes compared to other configurations.
  • Figure 3: F1 scores on HotpotQA for Mamba2-8B using the same experimental setup as Figure \ref{['fig:hotpot_em']}.