Directed Information: Estimation, Optimization and Applications in Communications and Causality

Dor Tsur; Oron Sabag; Navin Kashyap; Haim Permuter; Gerhard Kramer

Directed Information: Estimation, Optimization and Applications in Communications and Causality

Dor Tsur, Oron Sabag, Navin Kashyap, Haim Permuter, Gerhard Kramer

TL;DR

This work introduces directed information (DI) as a rigorous, causality-aware measure of information flow between stochastic processes, linking it to channel capacity with memory and feedback. It develops the theoretical foundations of causal conditioning, multivariate DI, and the DI rate, and relates DI to transfer entropy, Granger causality, and Pearl’s do-calculus. The monograph surveys a spectrum of DI estimation methods—ranging from plug-in and CTW to neural estimators like DINE—and discusses optimization of estimated DI for capacity problems, including NDT-based approaches for continuous inputs and RL-based strategies for discrete settings. A major portion focuses on feedback capacity of finite-state channels, presenting MD P formulations, Bellman equations, Q-graph and dual bounds, and reinforcement-learning-based algorithms (DDPG, POU) for capacity evaluation, with exact solutions in notable cases like the Ising channel. Collectively, the work provides a comprehensive toolkit for analyzing, estimating, and optimizing directional information flow in complex communication systems with memory and feedback, with broad implications for causal inference and control in data-driven contexts.

Abstract

Directed information (DI) is an information measure that attempts to capture directionality in the flow of information from one random process to another. It is closely related to other causal influence measures, such as transfer entropy, Granger causality, and Pearl's causal framework. This monograph provides an overview of DI and its main application in information theory, namely, characterizing the capacity of channels with feedback and memory. We begin by reviewing the definitions of DI, its basic properties, and its relation to Shannon's mutual information. Next, we provide a survey of DI estimation techniques, ranging from classic plug-in estimators to modern neural-network-based estimators. Considering the application of channel capacity estimation, we describe how such estimators numerically optimize DI rate over a class of joint distributions on input and output processes. A significant part of the monograph is devoted to techniques to compute the feedback capacity of finite-state channels (FSCs). The feedback capacity of a strongly connected FSC involves the maximization of the DI rate from the channel input process to the output process. This maximization is performed over the class of causal conditioned probability input distributions. When the FSC is also unifilar, i.e., the next state is given by a time-invariant function of the current state and the new input-output symbol pair, the feedback capacity is the optimal average reward of an appropriately formulated Markov decision process (MDP). This MDP formulation has been exploited to develop several methods to compute exactly, or at least estimate closely, the feedback capacity of a unifilar FSC. This monograph describes these methods, starting from the value iteration algorithm, to Q-graph methods, and reinforcement learning algorithms that can handle large input and output alphabets.

Directed Information: Estimation, Optimization and Applications in Communications and Causality

TL;DR

Abstract

Paper Structure (60 sections, 15 theorems, 196 equations, 16 figures, 2 tables, 2 algorithms)

This paper contains 60 sections, 15 theorems, 196 equations, 16 figures, 2 tables, 2 algorithms.

Introduction
Information Theory and Causality
Marko's Bidirectional Communication
Overview of Applications of the Directed Information
Challenges
Directed Information and Causal Conditioning
Notation
Definition and Causal Conditioning
Causally Conditioned Distribution
Causally Conditioned Entropy and Mutual Information
Causal Conditioned Directed Information and Multivariate Variants
Properties and Decompositions
Directed Information Rate
Relation to Transfer Entropy and Granger Causality
Transfer Entropy
...and 45 more sections

Key Result

Theorem 2.1

Let $S,T,S_d\subseteq\{1,\dots,n\}$ be pairwise disjoint sets of an SCM. If $S_d$ consists only of non-descendants of $S$ and $I(A^T\to A^S\mid A^{S_d})=0$, then the causal effect of intervening on $S$ is computable by adjustment:

Figures (16)

Figure 1: The founders of directed information.
Figure 2: Two-way communication. Messages are transmitted in both directions. Feedback can be used in such communication systems.
Figure 3: Simple depiction of information flow.
Figure 4: InfoMat visualization on the Ising channel under (a) a channel-oblivious scheme and (b) the capacity-achieving scheme from ElishcoPermuter2014Ising. DI corresponds to the sum of the upper triangular.
Figure 5: (a) The original SCM: $Z$ confounds $X$ and $Y$, opening the back-door path $X \leftarrow Z \rightarrow Y$. (b) After intervening with $\mathrm{do}(X{=}x)$, the incoming arrow to $X$ is removed and replaced by the intervention mechanism.
...and 11 more figures

Theorems & Definitions (29)

Remark 2.1: DI on abstract spaces
Example 2.1
Remark 2.2: DI in continuous time
Theorem 2.1: Back-door criterion via directed information raginsky2011directed
Theorem 3.1
Remark 3.1: Curse of dimensionality
Example 4.1: Memoryless channels
Example 4.2: Intersymbol interference
Example 4.3: Independent states
Example 4.4: Markovian states
...and 19 more

Directed Information: Estimation, Optimization and Applications in Communications and Causality

TL;DR

Abstract

Directed Information: Estimation, Optimization and Applications in Communications and Causality

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (29)