Assembly Theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution
Felipe S. Abrahão, Santiago Hernández-Orozco, Narsis A. Kiani, Jesper Tegnér, Hector Zenil
TL;DR
This paper challenges Assembly Theory (AT) by showing that its core metric—the assembly index—is an instance of LZ-based compression, equivalent to the size of a minimal CFG and bounded by Shannon entropy $H$ and classical compression schemes, thereby situating AT within algorithmic information theory via the Kolmogorov complexity $oldsymbol{K}$. It demonstrates that AT’s pathway complexity and assembly number do not provide a fundamentally new explanation for selection or evolution beyond what is captured by $H$, CFG-based grammar size, or other robust information-theoretic measures. The authors argue that AT relies on circular reasoning to claim links between physics and biology and that its empirical support is limited and not superior to existing methods, such as standard compression or entropy-based approaches. Overall, AT is presented as a constrained instantiation of well-established information-theoretic principles, with more sophisticated algorithmic measures likely offering better discrimination of structure, causality, and open-ended evolution in complex systems.
Abstract
We prove the full equivalence between Assembly Theory (AT) and Shannon Entropy via a method based upon the principles of statistical compression renamed `assembly index' that belongs to the LZ family of popular compression algorithms (ZIP, GZIP, JPEG). Such popular algorithms have been shown to empirically reproduce the results of AT, results that have also been reported before in successful applications to separating organic from non-organic molecules and in the context of the study of selection and evolution. We show that the assembly index value is equivalent to the size of a minimal context-free grammar. The statistical compressibility of such a method is bounded by Shannon Entropy and other equivalent traditional LZ compression schemes, such as LZ77, LZ78, or LZW. In addition, we demonstrate that AT, and the algorithms supporting its pathway complexity, assembly index, and assembly number, define compression schemes and methods that are subsumed into the theory of algorithmic (Kolmogorov-Solomonoff-Chaitin) complexity. Due to AT's current lack of logical consistency in defining causality for non-stochastic processes and the lack of empirical evidence that it outperforms other complexity measures found in the literature capable of explaining the same phenomena, we conclude that the assembly index and the assembly number do not lead to an explanation or quantification of biases in generative (physical or biological) processes, including those brought about by (abiotic or Darwinian) selection and evolution, that could not have been arrived at using Shannon Entropy or that have not been reported before using classical information theory or algorithmic complexity.
