On Kolmogorov Structure Functions
Samuel Epstein
TL;DR
The paper investigates how Kolmogorov's structure function $\mathbf{H}_k(x)$ interacts with the Minimum Description Length (MDL) principle under the Independence Postulate. It derives bounds linking $\mathbf{I}(x;\mathcal{H})$ to the growth of the structure function and introduces the minimal sufficient statistic $k^*(x)$ with the bound $k^*(x) <^\log \mathbf{K}(\mathbf{K}(x)) + \mathbf{I}(x;\mathcal{H})$. It then discusses the obstacles of set-restricted variants, argues that unrestricted structure functions do not readily describe physical-world data, and highlights how IP makes many structure-function claims purely mathematical. The discussion advocates carefully restricted model classes and encodings for meaningful two-part decompositions and connects these ideas to measuring strings with long-running shortest programs.
Abstract
All strings with low mutual information with the halting sequence will have flat Kolmogorov Structure Functions, in the context of Algorithmic Statistics. Assuming the Independence Postulate, strings with non-negligible information with the halting sequence are purely mathematical constructions, and cannot be found in nature. Thus Algorithmic Statistics does not study strings in the physical world. This leads to the general thesis that two part codes require limitations as shown in the Minimum Description Length Principle. We also discuss issues with set-restricted Kolmogorov Structure Functions.
