Table of Contents
Fetching ...

Assembly Theory Reduced to Shannon Entropy and Rendered Redundant by Naive Statistical Algorithms

Luan Ozelim, Abicumaran Uthamacumaran, Felipe S. Abrahão, Santiago Hernández-Orozco, Narsis A. Kiani, Jesper Tegnér, Hector Zenil

TL;DR

This critique analyzes Assembly Theory (AT) and its central measure Ai, arguing that Ai offers no novel causal insights beyond established information-theoretic measures such as Shannon entropy and LZW-based compression. The authors demonstrate, both theoretically and empirically, that Ai is effectively subsumed by the Block Decomposition Method (BDM) and related algorithmic-information frameworks (CTM, AP), and that Ai cannot robustly quantify selection or evolution since environment-dependent fitness signals cannot be captured by Ai alone. Through synthetic string experiments and analyses of molecular data, they show Ai converges to LZW and entropy with increasing object size, and that molecular-length effects largely drive reported separations between living and nonliving systems. The work concludes that AT's claims of unifying physics and biology are unfounded, repositioning Ai as a weaker, redundant instantiation of a broader computable-information toolkit with limited predictive power. Overall, Ai provides no advantages over traditional compression-based measures for detecting biosignatures or evolutionary patterns, and the purported physical grounding of AT is called into question. The study advocates recasting AT within a rigorous algorithmic-information-theoretic framework to avoid overstated claims and to leverage more robust metrics like BDM/CTM/AP for causal analysis in molecular data and evolution.

Abstract

Assembly Theory (AT) and its central measure, the assembly index (Ai), represent an invaluable opportunity to address some of the most persistent and widespread conflations and misconceptions about computability and complexity theory in science. The AT defence embodies several common concurrent misconceptions that pile on each other: the belief that Turing machines impose artefactual constraints, the mischaracterisation of Kolmogorov complexity as inapplicable, and the claims around Ai as different from Shannon entropy or compression algorithms. Here we show that the new arguments advanced by the AT group in their defence, are based on misleading and incomplete experiments that, when completed, show the extent of the correlations and overlapping with popular statistical compression algorithms, conforming with the mathematical equivalence to Shannon entropy previously mathematically proved and reported, which remains undisputed. Through theoretical and empirical analysis, we show that Ai does not offer a path towards fundamental novel causal or informational insights beyond what existing statistical frameworks already offer. Rather than offering a unifying theory of life as the AT authors suggest, we argue that AT obfuscates the field and provides a cautionary example of how the accumulation of conceptual mistakes can lead to a misleading theory. Finally, we show that Ai is a particular limited case of another complexity metric based on algorithmic (Kolmogorov) complexity, consisting of decomposing an object into its causal blocks that goes beyond, and outperforms, AT.

Assembly Theory Reduced to Shannon Entropy and Rendered Redundant by Naive Statistical Algorithms

TL;DR

This critique analyzes Assembly Theory (AT) and its central measure Ai, arguing that Ai offers no novel causal insights beyond established information-theoretic measures such as Shannon entropy and LZW-based compression. The authors demonstrate, both theoretically and empirically, that Ai is effectively subsumed by the Block Decomposition Method (BDM) and related algorithmic-information frameworks (CTM, AP), and that Ai cannot robustly quantify selection or evolution since environment-dependent fitness signals cannot be captured by Ai alone. Through synthetic string experiments and analyses of molecular data, they show Ai converges to LZW and entropy with increasing object size, and that molecular-length effects largely drive reported separations between living and nonliving systems. The work concludes that AT's claims of unifying physics and biology are unfounded, repositioning Ai as a weaker, redundant instantiation of a broader computable-information toolkit with limited predictive power. Overall, Ai provides no advantages over traditional compression-based measures for detecting biosignatures or evolutionary patterns, and the purported physical grounding of AT is called into question. The study advocates recasting AT within a rigorous algorithmic-information-theoretic framework to avoid overstated claims and to leverage more robust metrics like BDM/CTM/AP for causal analysis in molecular data and evolution.

Abstract

Assembly Theory (AT) and its central measure, the assembly index (Ai), represent an invaluable opportunity to address some of the most persistent and widespread conflations and misconceptions about computability and complexity theory in science. The AT defence embodies several common concurrent misconceptions that pile on each other: the belief that Turing machines impose artefactual constraints, the mischaracterisation of Kolmogorov complexity as inapplicable, and the claims around Ai as different from Shannon entropy or compression algorithms. Here we show that the new arguments advanced by the AT group in their defence, are based on misleading and incomplete experiments that, when completed, show the extent of the correlations and overlapping with popular statistical compression algorithms, conforming with the mathematical equivalence to Shannon entropy previously mathematically proved and reported, which remains undisputed. Through theoretical and empirical analysis, we show that Ai does not offer a path towards fundamental novel causal or informational insights beyond what existing statistical frameworks already offer. Rather than offering a unifying theory of life as the AT authors suggest, we argue that AT obfuscates the field and provides a cautionary example of how the accumulation of conceptual mistakes can lead to a misleading theory. Finally, we show that Ai is a particular limited case of another complexity metric based on algorithmic (Kolmogorov) complexity, consisting of decomposing an object into its causal blocks that goes beyond, and outperforms, AT.
Paper Structure (25 sections, 19 equations, 11 figures, 7 tables)

This paper contains 25 sections, 19 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Main figure: correlation plot between the most popular statistical compression algorithm LZW (behind ZIP and PNG) and the assembly index (Ai) for a growing ZBC sequence (length size of 100 characters) and its random permutations following the experiment in kempes2024. The colour code corresponds to the number of strings which produced the pair (Ai,LZW) during the experiments. The authors of Assembly Theory only reported the component with string size equal to 15 (highlighted in (b)), in their effort to demonstrate the novelty of Ai compared to LZW kempes2024 with a weak 0.25 correlation, without conducting or reporting on the full experiment. The apparent scattering of it is an artifact of the fixed scale. The quick asymptotic convergence to Spearman correlation 1 for multiple examples is provided in Table \ref{['tabpatterned']} and its undisputed mathematical equivalence in abrahao2024. Unlike the claim in kempes2024, the assembly index correlation does not weaken with object size \ref{['tabpatterned']}.
  • Figure 2: Density correlation plots for strings of growing length confirm the convergent asymptotic behaviour between the assembly index and LZW. A and B are random sequences of length up to 100 while C and D are patterned sequences of repeated blocks of up to 100 characters. C shows a patterned block of ABCDE letters repeating, while D shows a ten-letter repeating block. The variance is lower in the random sequences, with growing size converging to very high correlations of 0.98-0.99, while in patterned sequences the correlations of the growing chain are high at about 0.9. Even when visually it may appear that the scattering increases for patterned strings, this is a visual artifact from the number of greater block lengths. As shown in the numerical Table \ref{['tabpatterned']}, the correlation in all cases converges quick to 1 as a function of length as shown in Figs. \ref{['fig:sub12']},\ref{['fig:sub22']}. But the exercise was futile given that we had already proven the mathematical equivalence in abrahao2024. Unlike the claim in kempes2024, the assembly index correlation does not weaken with object size as shown in Table \ref{['tabpatterned']}.
  • Figure 3: Asymptotic behaviour of (a) the mean and (b) the standard deviation of the ratio between log LZW and log Ai values for strings up to size 200 built from randomly picking characters from the ZBC string 10,000 times for each size. These plots quantify the Assembly index (Ai) full convergence to LZW.
  • Figure 4: Quantile regressions considering 5%, 50 % and 95% quantiles and Pearson correlation (corr) between MA (top), Length of InChI string (middle) and LZW of InChI string (bottom) and MS2 using the data from Marshall2021. The shaded area represents a 90 % quantile interval. For any of the metrics, linear relationships with similar prediction capabilities are observed. This directly undercuts the AT authors' claims that one of the main advantages of MA (or, equivalently, Ai) is that it could be experimentally measured, since all the other metrics (LZW and the Length of the InChI codes) can also be estimated from experimental MS2 data directly, as we have done here. In contradiction to what the authors claim in kempes2024, the authors of AT have failed at controlling for molecular length, which drives their index. The authors of AT have thus far failed to show a single example in which LZ or Shannon entropy cannot reproduce any results produced by Ai or AT, meaning that Ai and AT are simply equivalent to applying LZ or Shannon entropy with many extra steps.
  • Figure 5: Linear relation between MA and the Length of the InChI strings of molecules using the data from the original AT paper Marshall2021. This shows that just by taking the length of SMILES notation, one can reproduce the results of AT and their assembly index. If AT has any advantage over other representations and other compression algorithms, it should produce different if not better results, but it does not. This was conducted on the same set of molecular compounds used by the authors in their paper Marshall2021 as the basic control experiment they did not perform, and was reported before to be able to separate organic from non-organic molecules zenil2018.
  • ...and 6 more figures