Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs
Peidong Liu, Junjiang Lin, Shaowen Wang, Yao Xu, Haiqing Li, Xuhao Xie, Siyi Wu, Hao Li
TL;DR
The paper tackles decision-making under high-dimensional, external context by introducing an information-theoretic, LLM-based summarization pipeline that produces compact context summaries $C_t$ to augment CMDP states. By formalizing context sufficiency through mutual information $I(S;C)$ and entropy $H(C_t)$, it derives regret and latency bounds with a bi-criteria objective that balances information value against computational cost. Empirically, summarized-context agents outperform raw-context and no-context baselines across diverse domains (discrete, continuous, visual, and recommendation) while improving sample efficiency and reducing latency and memory usage. The results demonstrate a scalable, interpretable framework that leverages LLMs for principled context processing in complex, resource-constrained environments, with transferability and robustness analyses supporting practical deployment.
Abstract
Contextual Markov Decision Processes (CMDPs) offer a framework for sequential decision-making under external signals, but existing methods often fail to generalize in high-dimensional or unstructured contexts, resulting in excessive computation and unstable performance. We propose an information-theoretic summarization approach that uses large language models (LLMs) to compress contextual inputs into low-dimensional, semantically rich summaries. These summaries augment states by preserving decision-critical cues while reducing redundancy. Building on the notion of approximate context sufficiency, we provide, to our knowledge, the first regret bounds and a latency-entropy trade-off characterization for CMDPs. Our analysis clarifies how informativeness impacts computational cost. Experiments across discrete, continuous, visual, and recommendation benchmarks show that our method outperforms raw-context and non-context baselines, improving reward, success rate, and sample efficiency, while reducing latency and memory usage. These findings demonstrate that LLM-based summarization offers a scalable and interpretable solution for efficient decision-making in context-rich, resource-constrained environments.
