Table of Contents
Fetching ...

Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs

Peidong Liu, Junjiang Lin, Shaowen Wang, Yao Xu, Haiqing Li, Xuhao Xie, Siyi Wu, Hao Li

TL;DR

The paper tackles decision-making under high-dimensional, external context by introducing an information-theoretic, LLM-based summarization pipeline that produces compact context summaries $C_t$ to augment CMDP states. By formalizing context sufficiency through mutual information $I(S;C)$ and entropy $H(C_t)$, it derives regret and latency bounds with a bi-criteria objective that balances information value against computational cost. Empirically, summarized-context agents outperform raw-context and no-context baselines across diverse domains (discrete, continuous, visual, and recommendation) while improving sample efficiency and reducing latency and memory usage. The results demonstrate a scalable, interpretable framework that leverages LLMs for principled context processing in complex, resource-constrained environments, with transferability and robustness analyses supporting practical deployment.

Abstract

Contextual Markov Decision Processes (CMDPs) offer a framework for sequential decision-making under external signals, but existing methods often fail to generalize in high-dimensional or unstructured contexts, resulting in excessive computation and unstable performance. We propose an information-theoretic summarization approach that uses large language models (LLMs) to compress contextual inputs into low-dimensional, semantically rich summaries. These summaries augment states by preserving decision-critical cues while reducing redundancy. Building on the notion of approximate context sufficiency, we provide, to our knowledge, the first regret bounds and a latency-entropy trade-off characterization for CMDPs. Our analysis clarifies how informativeness impacts computational cost. Experiments across discrete, continuous, visual, and recommendation benchmarks show that our method outperforms raw-context and non-context baselines, improving reward, success rate, and sample efficiency, while reducing latency and memory usage. These findings demonstrate that LLM-based summarization offers a scalable and interpretable solution for efficient decision-making in context-rich, resource-constrained environments.

Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs

TL;DR

The paper tackles decision-making under high-dimensional, external context by introducing an information-theoretic, LLM-based summarization pipeline that produces compact context summaries to augment CMDP states. By formalizing context sufficiency through mutual information and entropy , it derives regret and latency bounds with a bi-criteria objective that balances information value against computational cost. Empirically, summarized-context agents outperform raw-context and no-context baselines across diverse domains (discrete, continuous, visual, and recommendation) while improving sample efficiency and reducing latency and memory usage. The results demonstrate a scalable, interpretable framework that leverages LLMs for principled context processing in complex, resource-constrained environments, with transferability and robustness analyses supporting practical deployment.

Abstract

Contextual Markov Decision Processes (CMDPs) offer a framework for sequential decision-making under external signals, but existing methods often fail to generalize in high-dimensional or unstructured contexts, resulting in excessive computation and unstable performance. We propose an information-theoretic summarization approach that uses large language models (LLMs) to compress contextual inputs into low-dimensional, semantically rich summaries. These summaries augment states by preserving decision-critical cues while reducing redundancy. Building on the notion of approximate context sufficiency, we provide, to our knowledge, the first regret bounds and a latency-entropy trade-off characterization for CMDPs. Our analysis clarifies how informativeness impacts computational cost. Experiments across discrete, continuous, visual, and recommendation benchmarks show that our method outperforms raw-context and non-context baselines, improving reward, success rate, and sample efficiency, while reducing latency and memory usage. These findings demonstrate that LLM-based summarization offers a scalable and interpretable solution for efficient decision-making in context-rich, resource-constrained environments.

Paper Structure

This paper contains 47 sections, 1 theorem, 22 equations, 4 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

Suppose equation eq:suff_kl holds and the value/policy classes admit an effective dimension $d_\mathrm{eff}$ (e.g., the effective rank of the feature covariance). Then for constants $c_1,c_2>0$,

Figures (4)

  • Figure 1: Overview of the summarization-based Contextual MDP (CMDP) framework. Summaries $C_t$ condense history $H_t$ and exogenous signals $E_t$ to guide action selection and transitions. The dashed loop indicates continual summary updates.
  • Figure 2: Performance analysis: (a) Learning curves showing convergence behavior, (b) Information-theoretic validation of the framework.
  • Figure 3: Decision latency vs. context entropy $H(C_t)$ under varying token budgets and update policies.
  • Figure 4: Performance vs. computational budget (latency, tokens). Our method Pareto-dominates Raw Context across regimes.

Theorems & Definitions (1)

  • Proposition 1: Refined regret bound