Information Content and Entropy of Finite Patterns from a Combinatorial Perspective
Zsolt Pocze
TL;DR
This work presents a unified combinatorial framework for the information content and entropy of finite patterns, extending beyond traditional Shannon information. It defines the information content I(A) as the minimum number of binary decisions needed to specify a pattern and connects it to Kolmogorov complexity, while anchoring the theory with edge cases such as constant, uniformly random, and ergodic Markov patterns. The paper derives explicit formulas for these edge cases, establishes general bounds and properties (normalization, subadditivity, reversibility, monotonicity, redundancy), and introduces practical estimation via Kolmogorov-based, compression-based, and alternative measures. Entropy is then defined as H_C(A)=I(A)/(n+1), bridging short-pattern behavior with asymptotic Shannon entropy for ergodic Markov sources, and enabling robust analysis of diverse data types, including short sequences. The framework supports combining multiple estimation methods to improve accuracy and provides a foundation for broader applications in information theory and data analysis.
Abstract
A unified combinatorial definition of the information content and entropy of different types of patterns, compatible with the traditional concepts of information and entropy, going beyond the limitations of Shannon information interpretable for ergodic Markov processes. We compare the information content of various finite patterns and derive general properties of information quantity from these comparisons. Using these properties, we define normalized information estimation methods based on compression algorithms and Kolmogorov complexity. From a combinatorial point of view, we redefine the concept of entropy in a way that is asymptotically compatible with traditional entropy.
