Algorithmic causal structure emerging through compression
Liang Wendong, Simon Buchholz, Bernhard Schölkopf
TL;DR
The paper investigates how causal and symmetric structures can emerge from data compression when data originate from multiple environments and intervention targets are unknown. It introduces algorithmic causality, modeling causal mechanisms as CFMPs implemented by Turing machines, and uses UFCC-based finite codebook bounds to select causal directions by minimizing total code length. Through theoretical results and case studies on causal factorizations and symmetries, it demonstrates that compression-driven model selection can reveal causal structure even without identifiability. The empirical and theoretical findings suggest that large-scale models, such as language models, may exhibit emergent algorithmic causality as a by-product of data compression and shared mechanisms. This framework offers a complementary lens to Pearlian causality, focusing on regularities captured by algorithmic minimality rather than interventions alone.
Abstract
We explore the relationship between causality, symmetry, and compression. We build on and generalize the known connection between learning and compression to a setting where causal models are not identifiable. We propose a framework where causality emerges as a consequence of compressing data across multiple environments. We define algorithmic causality as an alternative definition of causality when traditional assumptions for causal identifiability do not hold. We demonstrate how algorithmic causal and symmetric structures can emerge from minimizing upper bounds on Kolmogorov complexity, without knowledge of intervention targets. We hypothesize that these insights may also provide a novel perspective on the emergence of causality in machine learning models, such as large language models, where causal relationships may not be explicitly identifiable.
