Table of Contents
Fetching ...

Dictionary Based Pattern Entropy for Causal Direction Discovery

Harikrishnan N B, Shubham Bhilare, Aditi Kathpalia, Nithin Nagaraj

TL;DR

The results demonstrate that minimizing pattern level uncertainty yields a robust, interpretable, and broadly applicable framework for causal discovery, and a principled link between deterministic pattern structure and stochastic variability.

Abstract

Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel \emph{Dictionary Based Pattern Entropy ($DPE$)} framework that infers both the direction of causation and the specific subpatterns driving changes in the effect variable. The framework integrates \emph{Algorithmic Information Theory} (AIT) and \emph{Shannon Information Theory}. Causation is interpreted as the emergence of compact, rule based patterns in the candidate cause that systematically constrain the effect. $DPE$ constructs direction-specific dictionaries and quantifies their influence using entropy-based measures, enabling a principled link between deterministic pattern structure and stochastic variability. Causal direction is inferred via a minimum-uncertainty criterion, selecting the direction exhibiting stronger and more consistent pattern-driven organization. As summarized in Table 7, $DPE$ consistently achieves reliable performance across diverse synthetic systems, including delayed bit-flip perturbations, AR(1) coupling, 1D skew-tent maps, and sparse processes, outperforming or matching competing AIT-based methods ($ETC_E$, $ETC_P$, $LZ_P$). In biological and ecological datasets, performance is competitive, while alternative methods show advantages in specific genomic settings. Overall, the results demonstrate that minimizing pattern level uncertainty yields a robust, interpretable, and broadly applicable framework for causal discovery.

Dictionary Based Pattern Entropy for Causal Direction Discovery

TL;DR

The results demonstrate that minimizing pattern level uncertainty yields a robust, interpretable, and broadly applicable framework for causal discovery, and a principled link between deterministic pattern structure and stochastic variability.

Abstract

Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel \emph{Dictionary Based Pattern Entropy ()} framework that infers both the direction of causation and the specific subpatterns driving changes in the effect variable. The framework integrates \emph{Algorithmic Information Theory} (AIT) and \emph{Shannon Information Theory}. Causation is interpreted as the emergence of compact, rule based patterns in the candidate cause that systematically constrain the effect. constructs direction-specific dictionaries and quantifies their influence using entropy-based measures, enabling a principled link between deterministic pattern structure and stochastic variability. Causal direction is inferred via a minimum-uncertainty criterion, selecting the direction exhibiting stronger and more consistent pattern-driven organization. As summarized in Table 7, consistently achieves reliable performance across diverse synthetic systems, including delayed bit-flip perturbations, AR(1) coupling, 1D skew-tent maps, and sparse processes, outperforming or matching competing AIT-based methods (, , ). In biological and ecological datasets, performance is competitive, while alternative methods show advantages in specific genomic settings. Overall, the results demonstrate that minimizing pattern level uncertainty yields a robust, interpretable, and broadly applicable framework for causal discovery.
Paper Structure (20 sections, 20 equations, 9 figures, 2 tables)

This paper contains 20 sections, 20 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Directed network of symbolic patterns illustrating deterministic structure between $X$ and $Y$. Nodes represent distinct substrings extracted from the driving sequence, and edges encode their weighted entropy with respect to the target sequence. A weighted entropy value close to $0$ indicates that the pattern induces highly deterministic transition behavior in the target, whereas larger values indicate increasing uncertainty in the induced transitions. The figure displays both directions, $X \rightarrow Y$ and $Y \rightarrow X$, enabling comparison of directional determinism.
  • Figure 2: Effect of delayed bit-flip on causal direction detection: accuracy versus bit-flip delay for $DPE$, $ETC_P$, $ETC_E$, and $LZ_P$.
  • Figure 3: Synthetic unidirectional coupling: accuracy of $DPE$, $ETC_E$, $ETC_P$, and $LZ_P$ versus coupling strength.
  • Figure 4: Synthetic unidirectional coupling: average entropy ($\bar{H}_{X \to Y}$ and $\bar{H}_{Y \to X}$) versus coupling strength ($\phi$) for the $DPE$ causal discovery framework. The increasing separation between the two curves indicates improved discrimination between the driving and response variables.
  • Figure 5: Sparse processes: comparison of accuracy for $DPE$, $ETC_E$, $ETC_P$, and $LZ_P$ across varying sparsity levels ($k$).
  • ...and 4 more figures