Table of Contents
Fetching ...

Neural Autoregressive Flows for Markov Boundary Learning

Khoa Nguyen, Bao Duong, Viet Huynh, Thin Nguyen

Abstract

Recovering Markov boundary -- the minimal set of variables that maximizes predictive performance for a response variable -- is crucial in many applications. While recent advances improve upon traditional constraint-based techniques by scoring local causal structures, they still rely on nonparametric estimators and heuristic searches, lacking theoretical guarantees for reliability. This paper investigates a framework for efficient Markov boundary discovery by integrating conditional entropy from information theory as a scoring criterion. We design a novel masked autoregressive network to capture complex dependencies. A parallelizable greedy search strategy in polynomial time is proposed, supported by analytical evidence. We also discuss how initializing a graph with learned Markov boundaries accelerates the convergence of causal discovery. Comprehensive evaluations on real-world and synthetic datasets demonstrate the scalability and superior performance of our method in both Markov boundary discovery and causal discovery tasks.

Neural Autoregressive Flows for Markov Boundary Learning

Abstract

Recovering Markov boundary -- the minimal set of variables that maximizes predictive performance for a response variable -- is crucial in many applications. While recent advances improve upon traditional constraint-based techniques by scoring local causal structures, they still rely on nonparametric estimators and heuristic searches, lacking theoretical guarantees for reliability. This paper investigates a framework for efficient Markov boundary discovery by integrating conditional entropy from information theory as a scoring criterion. We design a novel masked autoregressive network to capture complex dependencies. A parallelizable greedy search strategy in polynomial time is proposed, supported by analytical evidence. We also discuss how initializing a graph with learned Markov boundaries accelerates the convergence of causal discovery. Comprehensive evaluations on real-world and synthetic datasets demonstrate the scalability and superior performance of our method in both Markov boundary discovery and causal discovery tasks.
Paper Structure (22 sections, 4 theorems, 37 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 4 theorems, 37 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 2

The MB $\mathcal{MB}_{T}\subset\text{V}$ of random variable $T\in\mathbf{V}$ is the subset $\mathbf{S}\subset\mathbf{V}\setminus\{T\}$ that minimizes $H\left(T\mid\mathbf{S}\right)$, where $H(\cdot\mid\cdot)$is the conditional entropy. Furthermore, $\forall\mathbf{S'}\subseteq\mathbf{V}\setminus\{T\

Figures (2)

  • Figure 1: Illustration of weight masking mechanism in FANS, as described in Subsection \ref{['subsect:asmaf']}, for specific subset $\mathbf{S}=\{X_{2},X_{3},X_{5}\}$. This simplifies the original illustration of MADE architecture for better visualization, showing only a hidden layer with a block of nodes and output layer with a block of nodes. Full autoencoder may consist of multiple hidden layers, each with several blocks of hidden nodes, which can be permuted, as shown in Fig. 1 from germain2015made. Solid connections represent fully-connected weight regions in Step 1, while red connections indicate autoregressive masks in Step 2. Other subsets will have different masks. In compact version (right figure), assuming maximum size of observed subsets is 3, each block of hidden layer reduces to 2 nodes. Original element indices of $\mathbf{S}$ are rescaled and colliders resolved:$[2,3,5]\rightarrow\left[\left\lceil \frac{2*2}{5-1}\right\rceil ,\left\lceil \frac{3*2}{5-1}\right\rceil ,\left\lceil \frac{5*2}{5-1}\right\rceil \right]\rightarrow[1,2,3]$.
  • Figure 2: Different model options for estimating our score criterion---conditional entropy. Experiments conducted on synthetic nonlinear graphs with $d=100$. FANS-CPT, FANS-FN, and FANS-FN-NUL represent three variants in our proposed class of masked autoregressive flows (MAFs) tailored for MB discovery, corresponding to: a compact version with leaf masks, a non-compact version with leaf-masks, and a non-compact version without leaf masks. NAF and BNAF are typical MAF classes, while ACE strauss2021arbitrary represents a class of energy-based estimators. The left figure illustrates the reduction across training epochs of $\sum H(X\mid\mathcal{MB}_{X})$, which is the total conditional entropy as each variable conditions on its MB)---smaller values indicate better performance. The right figure shows the number of trainable parameters; fewer parameters imply greater efficiency. When FANS is fully parameterized and trained with leaf-masking (FANS-FN), it learns even better than the compact counterpart. However, the compact FANS, with half the parameters of NAF, is expressive enough to surpass NAF in training and estimation, demonstrating the efficacy of our FANS class, as discussed in Subsection \ref{['subsect:asmaf']}.

Theorems & Definitions (8)

  • Definition 1: Markov boundary (MB)
  • Theorem 2
  • Remark 3
  • Theorem 4: Error bound of FANSITEMB
  • Theorem 5: Soundness of FANSITEMB
  • Definition 6: Moral graph
  • Remark 7
  • Proposition 8