Table of Contents
Fetching ...

Assembly Theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution

Felipe S. Abrahão, Santiago Hernández-Orozco, Narsis A. Kiani, Jesper Tegnér, Hector Zenil

TL;DR

This paper challenges Assembly Theory (AT) by showing that its core metric—the assembly index—is an instance of LZ-based compression, equivalent to the size of a minimal CFG and bounded by Shannon entropy $H$ and classical compression schemes, thereby situating AT within algorithmic information theory via the Kolmogorov complexity $oldsymbol{K}$. It demonstrates that AT’s pathway complexity and assembly number do not provide a fundamentally new explanation for selection or evolution beyond what is captured by $H$, CFG-based grammar size, or other robust information-theoretic measures. The authors argue that AT relies on circular reasoning to claim links between physics and biology and that its empirical support is limited and not superior to existing methods, such as standard compression or entropy-based approaches. Overall, AT is presented as a constrained instantiation of well-established information-theoretic principles, with more sophisticated algorithmic measures likely offering better discrimination of structure, causality, and open-ended evolution in complex systems.

Abstract

We prove the full equivalence between Assembly Theory (AT) and Shannon Entropy via a method based upon the principles of statistical compression renamed `assembly index' that belongs to the LZ family of popular compression algorithms (ZIP, GZIP, JPEG). Such popular algorithms have been shown to empirically reproduce the results of AT, results that have also been reported before in successful applications to separating organic from non-organic molecules and in the context of the study of selection and evolution. We show that the assembly index value is equivalent to the size of a minimal context-free grammar. The statistical compressibility of such a method is bounded by Shannon Entropy and other equivalent traditional LZ compression schemes, such as LZ77, LZ78, or LZW. In addition, we demonstrate that AT, and the algorithms supporting its pathway complexity, assembly index, and assembly number, define compression schemes and methods that are subsumed into the theory of algorithmic (Kolmogorov-Solomonoff-Chaitin) complexity. Due to AT's current lack of logical consistency in defining causality for non-stochastic processes and the lack of empirical evidence that it outperforms other complexity measures found in the literature capable of explaining the same phenomena, we conclude that the assembly index and the assembly number do not lead to an explanation or quantification of biases in generative (physical or biological) processes, including those brought about by (abiotic or Darwinian) selection and evolution, that could not have been arrived at using Shannon Entropy or that have not been reported before using classical information theory or algorithmic complexity.

Assembly Theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolution

TL;DR

This paper challenges Assembly Theory (AT) by showing that its core metric—the assembly index—is an instance of LZ-based compression, equivalent to the size of a minimal CFG and bounded by Shannon entropy and classical compression schemes, thereby situating AT within algorithmic information theory via the Kolmogorov complexity . It demonstrates that AT’s pathway complexity and assembly number do not provide a fundamentally new explanation for selection or evolution beyond what is captured by , CFG-based grammar size, or other robust information-theoretic measures. The authors argue that AT relies on circular reasoning to claim links between physics and biology and that its empirical support is limited and not superior to existing methods, such as standard compression or entropy-based approaches. Overall, AT is presented as a constrained instantiation of well-established information-theoretic principles, with more sophisticated algorithmic measures likely offering better discrimination of structure, causality, and open-ended evolution in complex systems.

Abstract

We prove the full equivalence between Assembly Theory (AT) and Shannon Entropy via a method based upon the principles of statistical compression renamed `assembly index' that belongs to the LZ family of popular compression algorithms (ZIP, GZIP, JPEG). Such popular algorithms have been shown to empirically reproduce the results of AT, results that have also been reported before in successful applications to separating organic from non-organic molecules and in the context of the study of selection and evolution. We show that the assembly index value is equivalent to the size of a minimal context-free grammar. The statistical compressibility of such a method is bounded by Shannon Entropy and other equivalent traditional LZ compression schemes, such as LZ77, LZ78, or LZW. In addition, we demonstrate that AT, and the algorithms supporting its pathway complexity, assembly index, and assembly number, define compression schemes and methods that are subsumed into the theory of algorithmic (Kolmogorov-Solomonoff-Chaitin) complexity. Due to AT's current lack of logical consistency in defining causality for non-stochastic processes and the lack of empirical evidence that it outperforms other complexity measures found in the literature capable of explaining the same phenomena, we conclude that the assembly index and the assembly number do not lead to an explanation or quantification of biases in generative (physical or biological) processes, including those brought about by (abiotic or Darwinian) selection and evolution, that could not have been arrived at using Shannon Entropy or that have not been reported before using classical information theory or algorithmic complexity.
Paper Structure (19 sections, 11 theorems, 30 equations, 3 figures)

This paper contains 19 sections, 11 theorems, 30 equations, 3 figures.

Key Result

Lemma 1

Let $\mathcal{ S }$ be enumerable by an algorithm. Let $\mathbf{F}$ be an arbitrary formal theory that contains Assembly Theory, including all the decidable procedures of the chosen method for calculating the assembly index of an object for a nested subspace of $\mathcal{ S }$, and the program that where the function $c_\Gamma\left( y \right)$ gives the assembly index of the object $y$ in the ass

Figures (3)

  • Figure 1: A: The authors of AT have suggested that $\mathbf{K}$ would be proven to be contained in AT fridman. This Venn diagram shows how AT is connected to and subsumed within algorithmic complexity or ($\mathbf{K}$) and within the group of statistical compression as proven in this paper (see Sup. Inf. \ref{['supmat']}). B: Causal transition graph of a Turing machine with number 3019 (in Wolfram's enumeration scheme Wolfram2002) with an empty initial condition found by using a computable method (e.g. CTM, zenil2011) to explain how the block-patterned string 111000111000 was assembled step-by-step based on the principles of $\mathbf{K}$ describing the state, memory, and output of the process as a fully causal mechanistic explanation. A Turing machine is simply a procedural algorithm and any algorithm can be represented by a Turing machine. By definition, this is a mechanistic process, and as physical as anything else, not an 'abstract' or 'unrealisable' process.
  • Figure 2: A timeline of results in complexity science relevant to the claims and results of AT. AT renames several concepts, e.g. dictionary trees as 'assembly pathways'; relies heavily on algorithmic probability in its reduction of combinatorial space arguments, without attribution; and, as demonstrated, the assembly index is a LZ compression scheme (proofs provided in the Sup. Inf. \ref{['supmat']})
  • Figure 3: Taken from salient. The most basic statistical complexity indexes applied to the 18 molecular compounds for which the spectral data was made available from marshall_murray_cronin_2017. MA stands for Molecular Assembly based on the assembly index of the molecular compounds. These results were obtained even without optimising the algorithms after a simple flattening and bnarisation procedure of the data (full methods, data and code available at salient). The results show that other statistical measures separate the data just as MA does (or better) between living and non-living compounds.

Theorems & Definitions (23)

  • Lemma 1
  • proof
  • Corollary 2
  • proof
  • Lemma 3
  • proof
  • Corollary 4
  • proof
  • Theorem 5
  • proof
  • ...and 13 more