Table of Contents
Fetching ...

Bounded State in an Infinite Horizon: Proactive Hierarchical Memory for Ad-Hoc Recall over Streaming Dialogues

Bingbing Wang, Jing Li, Ruifeng Xu

TL;DR

ProStream is proposed, a proactive hierarchical memory framework for streaming dialogues that enables ad-hoc memory recall on demand by reasoning over continuous streams with multi-granular distillation and enables a bounded knowledge state for lower inference latency without sacrificing reasoning fidelity.

Abstract

Real-world dialogue usually unfolds as an infinite stream. It thus requires bounded-state memory mechanisms to operate within an infinite horizon. However, existing read-then-think memory is fundamentally misaligned with this setting, as it cannot support ad-hoc memory recall while streams unfold. To explore this challenge, we introduce \textbf{STEM-Bench}, the first benchmark for \textbf{ST}reaming \textbf{E}valuation of \textbf{M}emory. It comprises over 14K QA pairs in dialogue streams that assess perception fidelity, temporal reasoning, and global awareness under infinite-horizon constraints. The preliminary analysis on STEM-Bench indicates a critical \textit{fidelity-efficiency dilemma}: retrieval-based methods use fragment context, while full-context models incur unbounded latency. To resolve this, we propose \textbf{ProStream}, a proactive hierarchical memory framework for streaming dialogues. It enables ad-hoc memory recall on demand by reasoning over continuous streams with multi-granular distillation. Moreover, it employs Adaptive Spatiotemporal Optimization to dynamically optimize retention based on expected utility. It enables a bounded knowledge state for lower inference latency without sacrificing reasoning fidelity. Experiments show that ProStream outperforms baselines in both accuracy and efficiency.

Bounded State in an Infinite Horizon: Proactive Hierarchical Memory for Ad-Hoc Recall over Streaming Dialogues

TL;DR

ProStream is proposed, a proactive hierarchical memory framework for streaming dialogues that enables ad-hoc memory recall on demand by reasoning over continuous streams with multi-granular distillation and enables a bounded knowledge state for lower inference latency without sacrificing reasoning fidelity.

Abstract

Real-world dialogue usually unfolds as an infinite stream. It thus requires bounded-state memory mechanisms to operate within an infinite horizon. However, existing read-then-think memory is fundamentally misaligned with this setting, as it cannot support ad-hoc memory recall while streams unfold. To explore this challenge, we introduce \textbf{STEM-Bench}, the first benchmark for \textbf{ST}reaming \textbf{E}valuation of \textbf{M}emory. It comprises over 14K QA pairs in dialogue streams that assess perception fidelity, temporal reasoning, and global awareness under infinite-horizon constraints. The preliminary analysis on STEM-Bench indicates a critical \textit{fidelity-efficiency dilemma}: retrieval-based methods use fragment context, while full-context models incur unbounded latency. To resolve this, we propose \textbf{ProStream}, a proactive hierarchical memory framework for streaming dialogues. It enables ad-hoc memory recall on demand by reasoning over continuous streams with multi-granular distillation. Moreover, it employs Adaptive Spatiotemporal Optimization to dynamically optimize retention based on expected utility. It enables a bounded knowledge state for lower inference latency without sacrificing reasoning fidelity. Experiments show that ProStream outperforms baselines in both accuracy and efficiency.
Paper Structure (35 sections, 3 theorems, 8 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 3 theorems, 8 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1.3

According to Anderson's Rational Analysis of Memory anderson2013adaptive, the probability $P$ that a memory trace $v$ is needed follows: By setting $u_{v,t}$ as a linear combination of frequency (History) and temporal proximity (Context), ProStream effectively maximizes the Expected Recall Probability under a strict resource constraint.

Figures (8)

  • Figure 1: Comparison of the read-then-think paradigm (left) and the streaming memory paradigm (right) based on TBBT dialogues.
  • Figure 2: Overview of STEM-Bench Benchmark Curation. (a) Taxonomy of cognitive dimensions and tasks, where QA pairs are categorized by cognitive challenges (HFP, SLR, DGA) and distinct task types. (b) Data construction pipeline of the STEM-Bench dataset.
  • Figure 3: Preliminary analysis of RAG and full-context performance. (Top) The average accuracy of all performance metrics across evidence distances. (Bottom) Inference latency over dialogue turns. The dashed lines indicate the overall average results.
  • Figure 4: Overview of our ProStream framework with four components discussed in turn from $\S$\ref{['subsec:buffering']} to $\S$\ref{['subsec:reasoning']}.
  • Figure 5: Average performance metrics (left) and latency (right) of varying LLM backbones. The numbers above are the percentage improvement of ProStream over the Full-Context baseline.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Definition 1.1: Memory State and Budget
  • Definition 1.2: The Optimization Objective
  • Proposition 1.3: Connection to Rational Analysis of Memory
  • Theorem 1.4: Approximation Ratio
  • proof
  • Theorem 1.5: Bounded Time Complexity
  • proof