Table of Contents
Fetching ...

Directed Information $γ$-covering: An Information-Theoretic Framework for Context Engineering

Hai Huang

TL;DR

The paper tackles the challenge of selecting, compressing, and diversifying context for LLMs under budgets by introducing Directed Information γ-covering, a principled, query-agnostic framework that leverages directional predictive relationships among context chunks. It defines γ-covering via DI, develops a greedy submodular optimization with strong approximation guarantees, and establishes soundness and diversity properties, all while enabling offline precomputation that amortizes online cost. Empirically, the approach improves context compression, system prompt selection, and reranking on HotpotQA, with DIG-R diffusion-based reranking showing consistent gains when integrated with a strong retriever. The work demonstrates that self-organizing information-theoretic principles can stabilize and improve modern LLM pipelines, particularly under hard decision regimes and tight budgets, and points to future exploration in redundancy-rich and long-context settings.

Abstract

We introduce \textbf{Directed Information $γ$-covering}, a simple but general framework for redundancy-aware context engineering. Directed information (DI), a causal analogue of mutual information, measures asymmetric predictiveness between chunks. If $\operatorname{DI}_{i \to j} \ge H(C_j) - γ$, then $C_i$ suffices to represent $C_j$ up to $γ$ bits. Building on this criterion, we formulate context selection as a $γ$-cover problem and propose a greedy algorithm with provable guarantees: it preserves query information within bounded slack, inherits $(1+\ln n)$ and $(1-1/e)$ approximations from submodular set cover, and enforces a diversity margin. Importantly, building the $γ$-cover is \emph{query-agnostic}: it incurs no online cost and can be computed once offline and amortized across all queries. Experiments on HotpotQA show that $γ$-covering consistently improves over BM25, a competitive baseline, and provides clear advantages in hard-decision regimes such as context compression and single-slot prompt selection. These results establish DI $γ$-covering as a principled, self-organizing backbone for modern LLM pipelines.

Directed Information $γ$-covering: An Information-Theoretic Framework for Context Engineering

TL;DR

The paper tackles the challenge of selecting, compressing, and diversifying context for LLMs under budgets by introducing Directed Information γ-covering, a principled, query-agnostic framework that leverages directional predictive relationships among context chunks. It defines γ-covering via DI, develops a greedy submodular optimization with strong approximation guarantees, and establishes soundness and diversity properties, all while enabling offline precomputation that amortizes online cost. Empirically, the approach improves context compression, system prompt selection, and reranking on HotpotQA, with DIG-R diffusion-based reranking showing consistent gains when integrated with a strong retriever. The work demonstrates that self-organizing information-theoretic principles can stabilize and improve modern LLM pipelines, particularly under hard decision regimes and tight budgets, and points to future exploration in redundancy-rich and long-context settings.

Abstract

We introduce \textbf{Directed Information -covering}, a simple but general framework for redundancy-aware context engineering. Directed information (DI), a causal analogue of mutual information, measures asymmetric predictiveness between chunks. If , then suffices to represent up to bits. Building on this criterion, we formulate context selection as a -cover problem and propose a greedy algorithm with provable guarantees: it preserves query information within bounded slack, inherits and approximations from submodular set cover, and enforces a diversity margin. Importantly, building the -cover is \emph{query-agnostic}: it incurs no online cost and can be computed once offline and amortized across all queries. Experiments on HotpotQA show that -covering consistently improves over BM25, a competitive baseline, and provides clear advantages in hard-decision regimes such as context compression and single-slot prompt selection. These results establish DI -covering as a principled, self-organizing backbone for modern LLM pipelines.

Paper Structure

This paper contains 24 sections, 10 theorems, 41 equations, 6 tables, 2 algorithms.

Key Result

Lemma 3.1

For any $q,C_i,C_j$ under $p^\star$,

Theorems & Definitions (22)

  • Definition 3.1: Task-conditioned PMI
  • Definition 3.2: Directed Information massey1990causality
  • Lemma 3.1: PMI coupling bounds
  • Corollary 3.1.1: Pruning rule
  • Corollary 3.1.2: Promotion rule
  • Definition 3.3: Empirical predictiveness
  • proof : Sketch
  • Theorem 3.3: Safe pruning under estimation error
  • Definition 3.4: $\gamma$-covering edge
  • Definition 3.5: $\gamma$-coverage set
  • ...and 12 more