Local Compositional Complexity: How to Detect a Human-readable Messsage
Louis Mahon
TL;DR
This work tackles the problem of identifying meaningful, human‑readable messages in static data by proposing Local Compositional Complexity (LCC), a computable metric based on a two‑part description that separates structured (A) from unstructured (B) content. Local compositionality is defined as a locally tree‑like structure that captures the meaningful organization in data, and the LCC score is the length of the structured portion in the optimal description under MDL/MML principles. The authors demonstrate the approach across discrete (text), continuous (images, audio), and cross‑domain data, showing that random or repetitive data yield low LCC while real human signals yield high LCC, with the Arecibo message serving as a compelling extraterrestrial‑signal example. The study links the framework to entropy and macrostate concepts, discusses compression implications, and argues that LCC can help distinguish meaningful content from noise, with potential applications in detecting non‑human communication. Overall, LCC provides a principled, computable measure of meaningful complexity that spans domains and can inform signal understanding in both terrestrial and potential extraterrestrial contexts.
Abstract
Data complexity is an important concept in the natural sciences and related areas, but lacks a rigorous and computable definition. In this paper, we focus on a particular sense of complexity that is high if the data is structured in a way that could serve to communicate a message. In this sense, human speech, written language, drawings, diagrams and photographs are high complexity, whereas data that is close to uniform throughout or populated by random values is low complexity. We describe a general framework for measuring data complexity based on dividing the shortest description of the data into a structured and an unstructured portion, and taking the size of the former as the complexity score. We outline an application of this framework in statistical mechanics that may allow a more objective characterisation of the macrostate and entropy of a physical system. Then, we derive a more precise and computable definition geared towards human communication, by proposing local compositionality as an appropriate specific structure. We demonstrate experimentally that this method can distinguish meaningful signals from noise or repetitive signals in auditory, visual and text domains, and could potentially help determine whether an extra-terrestrial signal contained a message.
