Table of Contents
Fetching ...

Abstract Markov Random Fields

Leon Lang, Clélia de Mulatier, Rick Quax, Patrick Forré

TL;DR

The paper generalizes information-diagram techniques from Shannon entropy to a broad class of chain-rule functions $F$ by introducing $F$-independence, $F$-mutual independence, and $F$-dual total correlation. Using the generalized Hu theorem, it constructs $F$-diagrams that visualize higher-order terms for sets of variables and proves that $F$-Markov random fields are exactly those whose $F$-diagram regions corresponding to graph-disconnected vertex sets vanish. The authors develop subset determination and the separoid framework to extend Yeung's results to arbitrary $F$, with specialized applications to probabilistic models, Kullback-Leibler diagrams on Markov chains, and the diffusion-model ELBO decomposition. They demonstrate a diagrammatic representation of a weak second law of thermodynamics and provide a simple KL-decomposition of the diffusion ELBO, illustrating the practical impact for machine learning and statistical modeling. Overall, the work unifies high-order dependence concepts under $F$-diagrams and provides a foundation for analyzing graphical-model structures with general information measures.

Abstract

Markov random fields are known to be fully characterized by properties of their information diagrams, or I-diagrams. In particular, for Markov random fields, regions in the I-diagram corresponding to disconnected vertex sets in the graph vanish. Recently, I-diagrams have been generalized to F-diagrams, for a larger class of functions F satisfying the chain rule beyond Shannon entropy, such as Kullback-Leibler divergence and cross-entropy. In this work, we generalize the notion and characterization of Markov random fields to this larger class of functions F and investigate preliminary applications. We define F-independences, F-mutual independences, and F-Markov random fields and characterize them by their F-diagram. In the process, we also define F-dual total correlation and prove that its vanishing is equivalent to F-mutual independence. We then apply our results to information functions F that are applied to probability mass functions. We show that if the probability distributions of a set of random variables are Markov random fields for the same graph, then we formally recover the notion of an F-Markov random field for that graph. We then study the Kullback-Leibler diagrams on specific Markov chains, leading to a visual representation of the second law of thermodynamics and a simple explicit derivation of the decomposition of the evidence lower bound for diffusion models.

Abstract Markov Random Fields

TL;DR

The paper generalizes information-diagram techniques from Shannon entropy to a broad class of chain-rule functions by introducing -independence, -mutual independence, and -dual total correlation. Using the generalized Hu theorem, it constructs -diagrams that visualize higher-order terms for sets of variables and proves that -Markov random fields are exactly those whose -diagram regions corresponding to graph-disconnected vertex sets vanish. The authors develop subset determination and the separoid framework to extend Yeung's results to arbitrary , with specialized applications to probabilistic models, Kullback-Leibler diagrams on Markov chains, and the diffusion-model ELBO decomposition. They demonstrate a diagrammatic representation of a weak second law of thermodynamics and provide a simple KL-decomposition of the diffusion ELBO, illustrating the practical impact for machine learning and statistical modeling. Overall, the work unifies high-order dependence concepts under -diagrams and provides a foundation for analyzing graphical-model structures with general information measures.

Abstract

Markov random fields are known to be fully characterized by properties of their information diagrams, or I-diagrams. In particular, for Markov random fields, regions in the I-diagram corresponding to disconnected vertex sets in the graph vanish. Recently, I-diagrams have been generalized to F-diagrams, for a larger class of functions F satisfying the chain rule beyond Shannon entropy, such as Kullback-Leibler divergence and cross-entropy. In this work, we generalize the notion and characterization of Markov random fields to this larger class of functions F and investigate preliminary applications. We define F-independences, F-mutual independences, and F-Markov random fields and characterize them by their F-diagram. In the process, we also define F-dual total correlation and prove that its vanishing is equivalent to F-mutual independence. We then apply our results to information functions F that are applied to probability mass functions. We show that if the probability distributions of a set of random variables are Markov random fields for the same graph, then we formally recover the notion of an F-Markov random field for that graph. We then study the Kullback-Leibler diagrams on specific Markov chains, leading to a visual representation of the second law of thermodynamics and a simple explicit derivation of the decomposition of the evidence lower bound for diffusion models.
Paper Structure (31 sections, 46 theorems, 196 equations, 9 figures)

This paper contains 31 sections, 46 theorems, 196 equations, 9 figures.

Key Result

Proposition 2.1

Equivalence classes of random variables, together with the multiplication given by the join operation $X \cdot Y \coloneqq XY$ and the neutral element given by $\boldsymbol{1}: \Omega \to \mathop{\hbox{$\ast$}}$, form a commutative, idempotent monoid.

Figures (9)

  • Figure 1: A depiction of $\widetilde{X} = \widetilde{X}(n)$ for $n = 3$ and $n = 4$, which will later be used to represent all the (higher-order) information functions.
  • Figure 2: A depiction of the $I$-diagram and the $F$-diagram from the (generalized) Hu theorem. On the left, it shows the interplay of (conditional) Shannon entropy, mutual information, and interaction information for three random variables $X, Y, Z$. On the right, $X, Y$ and $Z$ are elements of a commutative, idempotent monoid, generalizing the collection of equivalence classes of random variables together with their joint operation, and the higher-order terms are all derived from a function $F$ satisfying the chain rule $F(XY) = F(X) + X.F(Y)$.
  • Figure 3: We show the effect of the graph structure of simple Markov random fields on the corresponding $I$-diagrams for a fixed probability mass function $P$. The $I$-diagram visualizes the relationships of the entropy, mutual information, and interaction information of the variables. The figure is based on Yeung's result Theorem \ref{['thm:charac_proba_mrfs']}, which shows that a set of random variables forms a Markov random field corresponding to a graph if and only if all atoms in the $I$-diagram corresponding to disconnected sets of vertices in the graph disappear. Concretely, to a set of vertices $J$, the corresponding atom in the $I$-diagram is the intersection of all disks with indices $j \in J$, without any element in the union of all the other disks. For the lower left panel, the three sets of vertices $\{1, 2\}$, $\{2, 3\}$, and $\{1, 2, 3\}$ are disconnected, giving rise to three disappearing atoms in the $I$-diagram. Consequently, $I(X_2)$ can be drawn to not intersect with the other disks. However, the other four sets of vertices $\{1\}, \{2\}, \{3\}, \{1, 3\}$ are clearly connected, which means that we cannot infer their corresponding atoms to vanish. Similar reasoning applies to the other three panels.
  • Figure 4: If random variables $X_1, \dots, X_n$ form a $P$-Markov chain, then many atoms in the $I$-diagram with respect to $P$ disappear by Corollary \ref{['cor:Markov_Chain_Charac_proba']}. The only atoms that remain are those corresponding to "intervals" in $[n]$. This leads to a fan-like structure of the $I$-diagram with respect to $P$, as visualized here for $n = 3$ and $n = 5$.
  • Figure 5: One key ingredient in the $F$-diagram characterization of $F$-Markov random fields is the characterization of $F$-mutual independences, Theorem \ref{['thm:charac_using_dual']}, here visualized for three elements $X_1, X_2, X_3 \in M$. The characterization shows that the mutual independence is equivalent to the vanishing of $F$-dual total correlation $DTC_{F}(X_1; X_2; X_3)$, which, by Hu's theorem, Theorem \ref{['thm:hu_kuo_ting_generalized']}, corresponds to the vanishing of a region of four atoms in the $F$-diagram, visualized as gray. Subset determination, Theorem \ref{['thm:subset_determination']}, then allows to conclude that every individual atom in this region vanishes, as shown in the rightmost part of the figure. The implication from right to left again follows from Hu's theorem and the fact that $F$-diagrams visualize a measure, meaning that larger regions are the sum of their atoms.
  • ...and 4 more figures

Theorems & Definitions (123)

  • Proposition 2.1: The Monoid of Random Variables
  • proof
  • Definition 2.2: Shannon Entropy
  • Definition 2.3: Averaged Conditioning
  • Proposition 2.4
  • Proposition 2.5: Chain Rule
  • Definition 2.6: Mutual Information, Interaction Information
  • Definition 2.7: ($G$-Valued) Measure
  • theorem 2.8: Generalized Hu Theorem, Lang2022
  • Definition 2.9: Graph
  • ...and 113 more