Semantic Rate Distortion and Posterior Design: Compute Constraints, Multimodality, and Strategic Inference
Emrah Akyol
TL;DR
The paper develops a unified information-theoretic framework for semantic inference under rate and compute constraints with misaligned encoder/decoder objectives. By modeling a Gaussian latent state $X$ and a semantic variable $\Theta = BX + V$, it derives exact posterior-covariance characterizations of the strategic rate–distortion function across direct, remote, and full-information encoding regimes, including semantic waterfilling and Gaussian persuasion in the rate-constrained setting. It shows that architectural compute limits act as implicit rate budgets, yielding exponential improvements in semantic precision with depth and inference time, and that multimodal observations remove a geometric-mean penalty inherent to remote encoding. The results offer a principled interpretation of modern multimodal AI and scaling laws as posterior-design problems, with practical implications for energy/data-efficient systems and alignment-aware communication strategies.
Abstract
We study strategic Gaussian semantic compression under rate and compute constraints, where an encoder and decoder optimize distinct quadratic objectives. A latent Gaussian state generates a task dependent semantic variable, and the decoder best responds via MMSE estimation, reducing the encoder's problem to posterior covariance design under an information rate constraint. We characterize the strategic rate distortion function in direct, remote, and full information regimes, derive semantic waterfilling and rate constrained Gaussian persuasion solutions, and establish Gaussian optimality under misaligned objectives. We further show that architectural compute limits act as implicit rate constraints, yielding exponential improvements in semantic accuracy with model depth and inference time compute, while multimodal observation eliminates the geometric mean penalty inherent to remote encoding. These results provide information theoretic foundations for data and energy efficient AI and offer a principled interpretation of modern multimodal language models as posterior design mechanisms under resource constraints.
