Table of Contents
Fetching ...

Scalable Structure Learning for Sparse Context-Specific Systems

Felix Leopoldo Rios, Alex Markham, Liam Solus

TL;DR

An algorithm for learning context-specific models that scales to hundreds of variables through a combination of an order-based Markov chain Monte-Carlo search and a novel, context-specific sparsity assumption that is analogous to those typically invoked for directed acyclic graphical models is presented.

Abstract

Several approaches to graphically representing context-specific relations among jointly distributed categorical variables have been proposed, along with structure learning algorithms. While existing optimization-based methods have limited scalability due to the large number of context-specific models, the constraint-based methods are more prone to error than even constraint-based directed acyclic graph learning algorithms since more relations must be tested. We present an algorithm for learning context-specific models that scales to hundreds of variables. Scalable learning is achieved through a combination of an order-based Markov chain Monte-Carlo search and a novel, context-specific sparsity assumption that is analogous to those typically invoked for directed acyclic graphical models. Unlike previous Markov chain Monte-Carlo search methods, our Markov chain is guaranteed to have the true posterior of the variable orderings as the stationary distribution. To implement the method, we solve a first case of an open problem recently posed by Alon and Balogh. Future work solving increasingly general instances of this problem would allow our methods to learn increasingly dense models. The method is shown to perform well on synthetic data and real world examples, in terms of both accuracy and scalability.

Scalable Structure Learning for Sparse Context-Specific Systems

TL;DR

An algorithm for learning context-specific models that scales to hundreds of variables through a combination of an order-based Markov chain Monte-Carlo search and a novel, context-specific sparsity assumption that is analogous to those typically invoked for directed acyclic graphical models is presented.

Abstract

Several approaches to graphically representing context-specific relations among jointly distributed categorical variables have been proposed, along with structure learning algorithms. While existing optimization-based methods have limited scalability due to the large number of context-specific models, the constraint-based methods are more prone to error than even constraint-based directed acyclic graph learning algorithms since more relations must be tested. We present an algorithm for learning context-specific models that scales to hundreds of variables. Scalable learning is achieved through a combination of an order-based Markov chain Monte-Carlo search and a novel, context-specific sparsity assumption that is analogous to those typically invoked for directed acyclic graphical models. Unlike previous Markov chain Monte-Carlo search methods, our Markov chain is guaranteed to have the true posterior of the variable orderings as the stationary distribution. To implement the method, we solve a first case of an open problem recently posed by Alon and Balogh. Future work solving increasingly general instances of this problem would allow our methods to learn increasingly dense models. The method is shown to perform well on synthetic data and real world examples, in terms of both accuracy and scalability.
Paper Structure (25 sections, 15 theorems, 53 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 15 theorems, 53 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

Theorem 4.1

There are $1 - \binom{i}{2} + \sum_{k=1}^ii^{d_k}$ stagings $\mathbf{s}_i$ of level $i$ in which each stage has at most two context variables; that is, stagings such that $\mathop{\mathrm{ms}}\nolimits(\mathbf{s}_i) \leq 2$.

Figures (6)

  • Figure 1: A CStree on four binary variables and its more compact LDAG representation. In the LDAG, an edge $i\rightarrow j$ vanishes whenever the outcome ${\bf x}_{\mathop{\mathrm{pa}}\nolimits_{\mathcal{G}}(j)\setminus i}$ in the edge label is realized. The notation $\ast$ indicates that any outcome of the particular variable is included in the edge label. Note that the single blue stage in Figure \ref{['fig:cstree']} corresponds to all of the context-specific relations captured by the edge labels $(\ast, 0)$ of $1\rightarrow 4$ and the labels of $2\rightarrow 4$. This reflects how the single CStree relation $X_4 \mathrel{\hbox{$\perp$}\mkern2mu{\perp}} {\bf X}_{1,2} \mid X_3 = 0$ is encoded in the LDAG, which is defined for pairwise relations, as discussed in Section \ref{['sec:relatedwork']}. Specifically, this CStree relation implies the pairwise relations $X_4 \mathrel{\hbox{$\perp$}\mkern2mu{\perp}} X_{1} \mid X_2, X_3 = 0$ and $X_4 \mathrel{\hbox{$\perp$}\mkern2mu{\perp}} X_{2} \mid X_1, X_3 = 0$ by basic properties of conditional independence.
  • Figure 2: Accuracy of CSlearn for different choices of Phase 1 methods and parameter estimators. Plots present results on a semilog scale.
  • Figure 3: Accuracy comparison of CSlearn with two staged tree algorithms. Plots present results on a semilog scale.
  • Figure 4: Runtime results.
  • Figure 5: LDAG of the CStree learned for the ALARM data.
  • ...and 1 more figures

Theorems & Definitions (40)

  • Example 1.1
  • Example 3.1
  • Remark 3.2
  • Theorem 4.1
  • Remark 4.2
  • Theorem 4.3
  • Remark 6.1
  • Remark 6.2
  • Remark 6.3
  • Example A.1
  • ...and 30 more