An Optimal, Universal and Agnostic Decoding Method for Message Reconstruction, Bio and Technosignature Detection

Hector Zenil; Alyssa Adams; Felipe S. Abrahão; Luan Ozelim

An Optimal, Universal and Agnostic Decoding Method for Message Reconstruction, Bio and Technosignature Detection

Hector Zenil, Alyssa Adams, Felipe S. Abrahão, Luan Ozelim

TL;DR

This work tackles reconstructing messages sent over zero-knowledge one-way channels when the emitter’s encoding is unknown. It introduces an agnostic, perturbation-based reconstruction method grounded in Algorithmic Information Dynamics (AID) and the universal distribution, validating it on the Arecibo message ($1679$ bits) arranged as $23\times73$ and on a diverse image set from Caltech-101. By scanning candidate partitions with metrics such as the Block Decomposition Method (BDM), entropy, and zlib-based compressibility, the approach identifies low-complexity, likely original encodings, demonstrating robust decoding without prior knowledge. The authors argue that this framework links information theory, geometry, and semantics, with implications for life and technosignature detection, cryptography, and coding theory, and outline a path toward universal generative models relevant to Artificial General Intelligence (AGI).

Abstract

We present an agnostic signal reconstruction method for zero-knowledge one-way communication channels in which a receiver aims to interpret a message sent by an unknown source about which no prior knowledge is available and to which no return message can be sent. Our reconstruction method is agnostic vis-à-vis the arbitrarily chosen encoding-decoding scheme and other observer-dependent characteristics, such as the arbitrarily chosen computational model, probability distributions, or underlying mathematical theory. We investigate how non-random messages encode information about their intended physical properties, such as dimension and length scales of the space in which a signal or message may have been originally encoded, embedded, or generated. We focus on image data as a first illustration of the capabilities of the new method. We argue that our results have applications to life and technosignature detection, and to coding theory in general.

An Optimal, Universal and Agnostic Decoding Method for Message Reconstruction, Bio and Technosignature Detection

TL;DR

bits) arranged as

and on a diverse image set from Caltech-101. By scanning candidate partitions with metrics such as the Block Decomposition Method (BDM), entropy, and zlib-based compressibility, the approach identifies low-complexity, likely original encodings, demonstrating robust decoding without prior knowledge. The authors argue that this framework links information theory, geometry, and semantics, with implications for life and technosignature detection, cryptography, and coding theory, and outline a path toward universal generative models relevant to Artificial General Intelligence (AGI).

Abstract

Paper Structure (5 sections, 9 figures)

This paper contains 5 sections, 9 figures.

Introduction
On the information content of the Arecibo message
Assessing the method's performance over several different classes of images
On the link between the universal distribution, AID, the method hereby proposed and Artificial General Intelligence (AGI)
Conclusions

Figures (9)

Figure 1: Left: The original Arecibo message intended to be reconstructed, but sent as a linear stream from the radio telescope in Arecibo, Puerto Rico. The 1,679 bits are meant to be arranged into 23 columns of 73 rows, 23 and 73 being two prime numbers which when multiplied together equal 1,679. Right: If the stream is instead arranged into 23 rows and 73 columns, the original visual interpretations of the message are scrambled, which may result in a figure that is closer to being statistically random. What we show is that the message is still there, concealed, and can be deciphered by algorithmic deconvolution.
Figure 2: Perturbations on the original 1D stream of binary digits of the Arecibo message. The "correct" shape of the message is represented in red and is observed for downward BDM peaks in the low-BDM region.
Figure 3: Top left: Most possible partitions result in random-appearing configurations with high corresponding complexity, indicating measurable randomness. Bottom: Some partitions ($n$ values) will approximate the originally encoded meaning (third from the right). Other configurations result in images with higher complexity values. This sequence of images shows the images in the approximate vicinity of the correct bidimensional configuration (i.e., partition) and illustrates fast convergence to low complexity. Top right: By using different information indexes across different configurations, a downward-pointing spike will indicate message (image) configurations that correspond to low-complexity image(s). In this plot, log complexity index is just a version of a scaled version of each metric, created to provide better visibility of the spikes. This allows a prior-knowledge-agnostic and objective method to infer a message's original encoding. Of the various measures, BDM, combining classical information (entropy) for long ranges and a measure motivated by algorithmic probability for short ranges, is the most sensitive and accurate in this regard. Traditional compression and entropy may also contribute to finding the right configuration amongst the top spiking candidates. The ratio of noise-to-signal was amplified in favour of the hidden structure by multiplying the original image size by 6 for both length and height (such amplification was necessary to make sure compression algorithms do not bias their results from the overhead needed to decode the messages).
Figure 4: The method's resilience in the face of some noise. At 3% of the bits of the original $23 \times 73=1679$-pixel image randomly flipped (which means about 1.5% were binary negated), the method remains sensitive and displays a small downward spike at the 23 value, uncovering its length, but the signal gets lost when more bits are flipped.
Figure 5: After applying 3 different quantitative methods (red - z-lib Compression; blue - BDM and orange - Entropy), it is shown (left) that they are highly insensitive to signal and highly sensitive to noise (16.5% of pixels randomly flipped). In that case, the correct partition size ($n=23$, so $n-9=14$) is modestly highlighted. However, when growing the original image (right) by a factor of 6 on each dimension (i.e. 1 pixel becomes a $6\times6$ array), the methods become less sensitive to noise and more sensitive to signal amplification, with BDM significantly outperforming Compression, and Shannon Entropy showing sensitivity at up to 60% pixels flipped (hence about 30% of the original image) versus Compression and Shannon Entropy, that are about 50% sensitive. Downward spikes (right) are shown at $23 \times 6 - 2= 136$.
...and 4 more figures

An Optimal, Universal and Agnostic Decoding Method for Message Reconstruction, Bio and Technosignature Detection

TL;DR

Abstract

An Optimal, Universal and Agnostic Decoding Method for Message Reconstruction, Bio and Technosignature Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)