Table of Contents
Fetching ...

The Derivation Penalty in Premise-Erasure Caching: Capacity, Strong Converse, and Dispersion Dichotomy

Jianfeng Xu

TL;DR

An information-theoretic framework for caching in derivation-based reasoning engines under independent premise erasure, and the ratio of the derivation-constrained cache to the coded cache converges to the reciprocal of the erasure rate, universally across query counts, overlap structures, and reliability targets.

Abstract

We introduce an information-theoretic framework for caching in derivation-based reasoning engines under independent premise erasure. Two decoder models are compared: a coded scheme using an arbitrary bit-string cache with a general-purpose decoder, and a derivation-constrained scheme where the cache consists of logical facts and the decoder must produce a valid proof. Four coding theorems are established. The first proves that each derivation step carries a universal per-step information content determined by the base size. The second reveals an exponential capacity separation between linear-chain and balanced-merge Datalog architectures at equal depth. The third identifies a critical access frequency separating the regimes where caching and on-demand derivation are optimal. The fourth determines the minimum derivation-constrained cache under erasure, decomposing query information into reliable cache and noisy channel capacity. The central result is the derivation penalty: the ratio of the derivation-constrained cache to the coded cache converges to the reciprocal of the erasure rate, universally across query counts, overlap structures, and reliability targets. This penalty originates from a structural caching rigidity theorem showing that only cache facts within the target query's derivation DAG contribute to resilience, precluding cross-coordinate error correction. Beyond capacity, we prove a strong converse at the KL-divergence rate with Bahadur--Rao prefactors, a dispersion dichotomy (positive coded dispersion versus zero derivation-constrained dispersion), and a complete eight-regime phase diagram. The architecture-dependent depth-to-dependency mapping yields exponentially sharper phase transitions for the merge architecture. All results transfer across synonymous representations.

The Derivation Penalty in Premise-Erasure Caching: Capacity, Strong Converse, and Dispersion Dichotomy

TL;DR

An information-theoretic framework for caching in derivation-based reasoning engines under independent premise erasure, and the ratio of the derivation-constrained cache to the coded cache converges to the reciprocal of the erasure rate, universally across query counts, overlap structures, and reliability targets.

Abstract

We introduce an information-theoretic framework for caching in derivation-based reasoning engines under independent premise erasure. Two decoder models are compared: a coded scheme using an arbitrary bit-string cache with a general-purpose decoder, and a derivation-constrained scheme where the cache consists of logical facts and the decoder must produce a valid proof. Four coding theorems are established. The first proves that each derivation step carries a universal per-step information content determined by the base size. The second reveals an exponential capacity separation between linear-chain and balanced-merge Datalog architectures at equal depth. The third identifies a critical access frequency separating the regimes where caching and on-demand derivation are optimal. The fourth determines the minimum derivation-constrained cache under erasure, decomposing query information into reliable cache and noisy channel capacity. The central result is the derivation penalty: the ratio of the derivation-constrained cache to the coded cache converges to the reciprocal of the erasure rate, universally across query counts, overlap structures, and reliability targets. This penalty originates from a structural caching rigidity theorem showing that only cache facts within the target query's derivation DAG contribute to resilience, precluding cross-coordinate error correction. Beyond capacity, we prove a strong converse at the KL-divergence rate with Bahadur--Rao prefactors, a dispersion dichotomy (positive coded dispersion versus zero derivation-constrained dispersion), and a complete eight-regime phase diagram. The architecture-dependent depth-to-dependency mapping yields exponentially sharper phase transitions for the merge architecture. All results transfer across synonymous representations.
Paper Structure (71 sections, 75 theorems, 33 equations, 4 figures, 1 table)

This paper contains 71 sections, 75 theorems, 33 equations, 4 figures, 1 table.

Key Result

Proposition 2.1

Under Assumptions assump:finite-so and assump:core-extractable, $A:=\text{Atom}(S_O)$ satisfies: (i)$\text{Cn}(A)=\text{Cn}(S_O)$; (ii)$a\notin\text{Cn}(A\setminus\{a\})$ for all $a\in A$; (iii)$A$ is uniquely determined by $S_O$ and the canonical order.

Figures (4)

  • Figure 1: Logical dependency map. Solid arrows denote primary dependencies; the dashed arrow indicates CT2's architecture mapping $d\mapsto\kappa_{\mathcal{A}}(d)$ feeding the depth-space re-parameterization of §\ref{['sec:arch-depth']}. The $1/\varepsilon$ derivation penalty (Theorem \ref{['thm:source-channel-separation']}) arises from comparing CT4's derivation-constrained converse---enabled by the structural rigidity of Theorem \ref{['thm:ancestral-relevance']}---with the coded caching achievability over $\mathrm{BEC}_m(\varepsilon)$. Representation invariance (Theorem \ref{['thm:CT-transfer']}) ensures all coding theorems transfer across synonymous representations.
  • Figure 2: Penalty ratios and convergence to $1/\varepsilon$ (Experiments 1 and 5). All legends are placed outside the plotting regions to avoid covering curves.
  • Figure 3: Strong converse exponent and Bahadur--Rao prefactor validation (Experiment 2). The left panel shows convergence $\hat{D}(\kappa)\to D(\alpha\|\varepsilon)$ as in Theorem \ref{['thm:sc-exponent']}; the right panel validates the $\tfrac{1}{2}\log_2\kappa$ prefactor scaling in Theorem \ref{['thm:bahadur-rao']}. Legends are placed outside the plotting regions.
  • Figure 4: Experiments 3--4. Left: coded second-order correction $\Delta_c$ stabilizes at the dispersion-predicted constant, while the derivation-constrained correction is identically zero (Theorem \ref{['thm:unc-second-order']}). Middle: coded phase transition sharpens around $\rho=\varepsilon=0.3$ (vertical dotted line) as $\kappa$ increases; the log-scale $y$-axis reveals the exponential error decay in the super-capacity regime. Right: architecture-dependent depth--resilience gap grows as $\varepsilon\to 0$.

Theorems & Definitions (134)

  • Definition 2.1: Semantic atomic core
  • Proposition 2.1: Core correctness
  • Definition 2.2: Intrinsic vs. operational premise bases
  • Proposition 2.2: Finite acyclic predecessor unfolding
  • Definition 2.3: Base-relative derivation depth
  • Theorem 2.1: Well-definedness and computability of $\text{Dd}$
  • Definition 2.4: Intrinsic and operational depths
  • Definition 2.5: Canonical derivation DAG
  • Definition 2.6: Depth strata and width profile
  • Definition 2.7: Dependency count vs. head arity
  • ...and 124 more