On the Salient Limitations of the Methods of Assembly Theory and their Classification of Molecular Biosignatures
Abicumaran Uthamacumaran, Felipe S. Abrahão, Narsis A. Kiani, Hector Zenil
TL;DR
The paper challenges Assembly Theory (AT) by arguing that its molecular assembly index (MA) is not a novel complexity measure but a dictionary-based lossless compression analogue, essentially equivalent to $LZ77/LZ78$-style coding. Through cross-data benchmarks on mass spectrometry signatures, InChI strings, and bond-distance matrices, MA shows strong correlations with standard compression schemes (e.g., $1D$-RLE, $1D$-Huffman, $LZW$) and fails to outperform established algorithmic-information measures such as $BDM$ and related CTM-inspired approaches. The authors also demonstrate a deceiving-molecule phenomenon where objects with high MA can arise from simple, resource-bounded processes, leading to false positives and challenging claims that MA uniquely detects life or extraterrestrial biosignatures. They argue for adopting algorithmic-information frameworks that account for environment, modularity, and higher-order causality, rather than relying on MA/AT alone, to robustly discriminate biosignatures across data representations. Overall, the work clarifies significant limitations of MA and urges a shift toward intrinsic complexity measures rooted in algorithmic information theory for life-detection and biosignature analysis.
Abstract
We demonstrate that the assembly pathway method underlying assembly theory (AT) is an encoding scheme widely used by popular statistical compression algorithms. We show that in all cases (synthetic or natural) AT performs similarly to other simple coding schemes and underperforms compared to system-related indexes based upon algorithmic probability that take into account statistical repetitions but also the likelihood of other computable patterns. Our results imply that the assembly index does not offer substantial improvements over existing methods, including traditional statistical ones, and imply that the separation between living and non-living compounds following these methods has been reported before.
