Non-Random Data Encodes its Geometric and Topological Dimensions

Hector Zenil; Felipe S. Abrahão; Luan C. S. M. Ozelim

Non-Random Data Encodes its Geometric and Topological Dimensions

Hector Zenil, Felipe S. Abrahão, Luan C. S. M. Ozelim

TL;DR

The paper tackles the problem of decoding non-random signals embedded in unknown multidimensional spaces without any prior knowledge of the encoding scheme, by introducing Algorithmic Information Dynamics (AID) and a perturbation-based reconstruction approach. It combines perturbation analysis with the Block Decomposition Method (BDM) to estimate algorithmic complexity and identify the original multidimensional space $\mathcal{S}$ that best explains a received message, achieving a zero-knowledge one-way communication framework. Empirically, it demonstrates that downward spikes in complexity landscapes reliably indicate correct dimensions across text, image, and audio-like data, and it provides a theoretical foundation connecting algorithmic information theory to a generalized noisy-channel setting. The work offers a universal methodology for decoding without priors, with potential impact on signal processing, cryptography, and detection of technosignatures, and it ties together geometry, topology, and semantics through compression-based information measures.

Abstract

Based on the principles of information theory, measure theory, and theoretical computer science, we introduce a signal deconvolution method with a wide range of applications to coding theory, particularly in zero-knowledge one-way communication channels, such as in deciphering messages (i.e., objects embedded into multidimensional spaces) from unknown generating sources about which no prior knowledge is available and to which no return message can be sent. Our multidimensional space reconstruction method from an arbitrary received signal is proven to be agnostic vis-à-vis the encoding-decoding scheme, computation model, programming language, formal theory, the computable (or semi-computable) method of approximation to algorithmic complexity, and any arbitrarily chosen (computable) probability measure. The method derives from the principles of an approach to Artificial General Intelligence (AGI) capable of building a general-purpose model of models independent of any arbitrarily assumed prior probability distribution. We argue that this optimal and universal method of decoding non-random data has applications to signal processing, causal deconvolution, topological and geometric properties encoding, cryptography, and bio- and technosignature detection.

Non-Random Data Encodes its Geometric and Topological Dimensions

TL;DR

that best explains a received message, achieving a zero-knowledge one-way communication framework. Empirically, it demonstrates that downward spikes in complexity landscapes reliably indicate correct dimensions across text, image, and audio-like data, and it provides a theoretical foundation connecting algorithmic information theory to a generalized noisy-channel setting. The work offers a universal methodology for decoding without priors, with potential impact on signal processing, cryptography, and detection of technosignatures, and it ties together geometry, topology, and semantics through compression-based information measures.

Abstract

Paper Structure (18 sections, 1 theorem, 5 equations, 7 figures)

This paper contains 18 sections, 1 theorem, 5 equations, 7 figures.

Introduction
Empirical Results
On the perturbation analysis of compressed data
Information content and bitwise perturbations
Random binary strings
String of random characters
Sentences from Darwin's "Origin of Species"
Information content and structural perturbations
Methods
Formalism and basic concepts
Theoretical results
Toward a general theory for zero-knowledge one-way communication
Discussion
Avoiding algorithmic information distortions in arbitrarily complex multidimensional spaces
Algorithmic information dynamics in communication problems
...and 3 more sections

Key Result

Theorem 1

Let $y$ be an arbitrary object embedded into an arbitrary multidimensional space $\mathcal{ S }$ such that the following conditions hold, where $\mathcal{P}$ is any algorithmic perturbation that transforms $y$ embedded into $\mathcal{S}$ into another $y'$ embedded into $\mathcal{S}'$. Let $\mathbf{P}' \left[ \cdot , \cdot \right]$ be the uniform probability measure over the space of all possible

Figures (7)

Figure 1: Top left: A reconstruction exercise of a 3D image of a Magnetic Resonance image of a knee embedded in a cube. Top right: Reconstruction from the bottom: perturbation analysis on various partitions. Spikes occur at the original dimension's multiples: 64, 128, and 192. When the linear signal stream is partitioned at the first candidate, the next dimensions are indicated by downward spikes, or upward spikes even on the first pass. A mirror image (top right) is indicated and reconstructed as the most likely candidate, and the correct knee configuration appears at the second spike (top left).
Figure 2: Algorithmic perturbation analysis on strings of size 402. Random binary numbers: For a binary random string with balanced number of 0s and 1s, performing a bit flip for several bits (from 0 - original string, up to 402 - all the characters in the string) causes the resulting size of the LZW dictionary and the zlib encoded string to stay approximately the same (median value constant in the plots). The same is observed for Shannon entropy, which did not change appreciably and stayed close to its maximum value. This is to be expected since our original string is already in a maximal entropy state. On the other hand, BDM slightly decreases, which indicates that the bit flips brought the string to a slightly more informative state (less random too, because, in theory, perfect randomness requires the number of 1s and 0s to be the same, and this is not guaranteed for the flipped strings). Perturbing balanced binary random messages indicates that, overall, the only information gain possible in the process is obtained when the bit balance is broken, which is graphically represented as an up concavity curve for BDM values. This would be equivalent to tempering a coin and observing a series of its flips. Random characters sampled from a 95-character alphabet after vowel encoding: This new binary string is unbalanced, with 12.19 % of its bits being 1s. Bit flips were performed revealing that all the metrics tend to increase up to when half of the bits are flipped. This is to be expected since our original string, although randomly generated, considered an unbalanced encoding which increased information content. Besides, it is worth noting that the simple fact that the number of 0s and 1s is not the same provides information about the message. Therefore, the down concavity observed implies that the original message is more informative than any perturbation processed. This is because signals carrying meaning are far removed from randomness, and random perturbations make the text more random Zenil2019c, even when the methods know nothing about words, grammar or anything linguistic.Random characters sampled from a 95-character alphabet after space encoding: This new binary string is highly unbalanced, with 1.49 % of its bits being 1s. Bit flips were performed revealing the same behaviour as observed for the vowel encoding. Compared to the vowel encoding, it is possible to see that all the metrics span for a larger interval when highly unbalanced encodings are considered.
Figure 3: Algorithmic perturbation flowchart for strings of characters randomly picked from a character pool. Even when randomly sampling from a character pool, it is possible that maximal entropy is not observed even in the original alphabet space. By using encoding to convert characters to binary, information content as well as order is introduced in the process, thereby making the converted string "less" random than the original one.
Figure 4: Algorithmic perturbation analysis on strings of size 3216. Random characters sampled from a 95-character alphabet after UTF-8 encoding: This binary string is almost balanced, with 47.42 % of its bits being 1s. Bit flips were performed for several bits (from 0 - original string, up to 3216 - all the characters in the string), revealing that with the exception of BDM, all the metrics only slightly change (less than 2%). BDM, on the other hand, points to a continuous increase up to when half of the bits are flipped. Once again, the down concavity observed implies that the original message is more informative than any perturbation processed. Compared to the other unbalanced encodings, as soon as a more balanced encoded string is seen, entropy and compression algorithms tend to lose their power to detect information content changes. Excerpt of Darwin's "Origin of Species" after UTF-8 (unbalanced) encoding: This binary string is almost balanced, with 45.55 % of its bits being 1s. Bit flips were performed revealing that BDM and z-lib were the most sensitive metrics to information content changes. The down concavity observed implies that the original message is more informative than any perturbation processed.
Figure 5: Algorithmic perturbation analysis on strings of size 6432. Random characters sampled from a 95-character alphabet after balanced encoding: This binary string is perfectly balanced, with 50 % of its bits being 1s. Bit flips were performed for several bits (from 0 - original string, up to 6432 - all the characters in the string), revealing that except from BDM and z-lib, all the metrics present small changes (less than 2%). BDM and z-lib, on the other hand, present a continuous increase up to when half of the bits are flipped. BDM is more sensitive than z-lib, as the size of the plateau is smaller with longer lateral climbs. Once more, the down concavity observed implies that the original message is more informative than any perturbation processed. Overall, entropy and LZW lose their power as proxies of information content whenever balance is observed for the binary string. Excerpt of Darwin's "Origin of Species" after balanced encoding: This binary string is also perfectly balanced, with 50 % of its bits being 1s. Bit flips were performed, once again revealing the capabilities of BDM and z-lib to assess the information content of the message, and general analysis implies that the original message is more informative than any perturbation processed.
...and 2 more figures

Theorems & Definitions (1)

Theorem 1: Theorem $2.10$ in the Sup. Mat.

Non-Random Data Encodes its Geometric and Topological Dimensions

TL;DR

Abstract

Non-Random Data Encodes its Geometric and Topological Dimensions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)