Transfer entropy for finite data
Alec Kirkley
TL;DR
The paper introduces a combinatorial, nonparametric reduced transfer entropy that corrects the finite-sample positivity bias and provides automatic statistical interpretation without simulations. By reinterpreting data as finite populations and using a contingency-table encoding, the authors derive $H_C(\mathbf{w}|\mathbf{V})$ and the reduced TE $\mathcal{R}^{(k,l)}_{\mathbf{x}\to \mathbf{y}}$, with a guaranteed non-positive correction $\Delta^{(k,l)}_{\mathbf{x}\to \mathbf{y}}$ and MDL-based model selection. The framework recovers asymptotically standard TE but remains reliable for small $N$ or high cardinality $C$, supports multivariate extensions, and enables automatic lag selection. Through synthetic and real-data experiments, the method yields more robust, sparse, and interpretable information-flow networks compared to conventional TE methods, with practical implications for neuroscience, climate science, finance, and conflict data analysis.
Abstract
Transfer entropy is a widely used measure for quantifying directed information flows in complex systems. While the challenges of estimating transfer entropy for continuous data are well known, it has two major shortcomings for data of finite cardinality: it exhibits a substantial positive bias for sparse bin counts, and it has no clear means to assess statistical significance. By computing information content in finite data streams without explicitly considering symbols as instances of random variables, we derive a transfer entropy measure which is asymptotically equivalent to the standard plug-in estimator but remedies these issues for time series of small size and/or high cardinality, permitting a fully nonparametric assessment of statistical significance without simulation.
