Neural Graph Revealers

Harsh Shrivastava; Urszula Chajewska

Neural Graph Revealers

Harsh Shrivastava, Urszula Chajewska

TL;DR

This work proposes Neural Graph Revealers (NGRs), that are an attempt to efficiently merge the sparse graph recovery methods with PGMs into a single flow and introduces `Graph-constrained path norm' that NGRs leverage to learn a graphical model that captures complex non-linear functional dependencies between the features in the form of an undirected sparse graph.

Abstract

Sparse graph recovery methods work well where the data follows their assumptions but often they are not designed for doing downstream probabilistic queries. This limits their adoption to only identifying connections among the input variables. On the other hand, the Probabilistic Graphical Models (PGMs) assume an underlying base graph between variables and learns a distribution over them. PGM design choices are carefully made such that the inference \& sampling algorithms are efficient. This brings in certain restrictions and often simplifying assumptions. In this work, we propose Neural Graph Revealers (NGRs), that are an attempt to efficiently merge the sparse graph recovery methods with PGMs into a single flow. The problem setting consists of an input data X with D features and M samples and the task is to recover a sparse graph showing connection between the features and jointly learn a probability distribution over them. NGRs view the neural networks as a `glass box' or more specifically as a multitask learning framework. We introduce `Graph-constrained path norm' that NGRs leverage to learn a graphical model that captures complex non-linear functional dependencies between the features in the form of an undirected sparse graph. Furthermore, NGRs can handle multimodal inputs like images, text, categorical data, embeddings etc. which is not straightforward to incorporate in the existing methods. We show experimental results of doing sparse graph recovery and probabilistic inference on data from Gaussian graphical models and a multimodal infant mortality dataset by Centers for Disease Control and Prevention.

Neural Graph Revealers

TL;DR

Abstract

Paper Structure (11 sections, 4 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 4 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Methods
Neural Graph Revealers
Representation
Optimization
Modeling multi-modal data
Representation as a probabilistic graphical model
Experiments
Learning Gaussian Graphical Models
Infant Mortality data analysis
Conclusions, Discussions & Future work

Figures (7)

Figure 1: Graph Recovery approaches. Methods designed to recover undirected graphs are categorized. Neural Graph Revealers (NGRs) lie under the regression based algorithms. The algorithms (leaf nodes) listed here are representative of the sub-category and the list is not exhaustive.
Figure 2: Workflow of NGRs . (left) We start with a fully connected Neural Network (MLP here) where both the input and output are the given features $x_i's$. Viewing NN as a multitask learning framework indicates that the output features are dependent on all the input features in the initial fully connected setting. (middle) The learned NGR optimizes the network connections to fit the regression on the input data as well as satisfy the sparsity constraints, refer Eq. \ref{['eqn:optimization-function-ngr']}. If there is a path from the input feature to an output feature, that indicates a dependency, potentially non-linear, between them. The bigger the size of NN (number of layers, hidden unit dimensions) the richer will be the functional representation. Note that not all the weights of the MLP (those dropped during training in grey-dashed lines) are shown for the sake of clarity. (right) The sparse dependency graph between the input and output of the MLP reduces to its normalized weight matrix product $S_{G} = \operatorname{norm}\left(|W_1|\times |W_2|\right)$.
Figure 3: Multi-modal data handling with Projection modules. The input X can be one-hot (categorical), image or in general an embedding (text, audio, speech and other data types). Projection modules (encoder + decoder) are used as a wrapper around the NGR base architecture. The architecture choice of the projection modules depends on the input data type and users' design choices. Note that the output of the encoder can be more than 1 unit ($e_1$ can be a hypernode) and the corresponding adjacency matrix $S_{\text{diag}}$ of the graph-constrained path norm can be adjusted. Similarly, the decoder side decoder side of the NGR architecture is updated. The remaining details are similar to the ones described in Fig. \ref{['fig:ngr-architecture']}
Figure 4: Multi-modal data handling with Graph-constrained path norm. W.l.o.g. we consider an input X to be embeddings that can come from text, audio, speech and other data types. We extend the idea of applying GcPn to the encoder MLP and the decoder MLP. We initilialize a fully connected MLP and then using the GcPn penalties, we capture the desired input to output unit path dependencies after optimizing the Eq. \ref{['eqn:gcpn-optimization-function-ngr']}. Neural network nodes containing embeddings are shown as hypernodes. We define hypernodes for the sake of brevity to convey that all units of the embedding vector within the hypernode are considered a single unit when deciding the edge connections defining a graph. The encoder and decoder MLPs are used as a wrapper around the NGR base architecture. The remaining details are similar to the ones described in Fig. \ref{['fig:ngr-architecture']}.
Figure 5: Modeling GGMs using NGRs. (left) The Conditional Independence graph shrivastava2022methods for the chain structure is shown. Positive partial correlations between the nodes are shown in green, while the negative partial correlations in red. A positive partial correlation between nodes (A, B) will mean that increasing the value of A will correspond to increase in value of B. Partial negative correlation will mean the opposite. These correlations show direct dependence or, in other words, the dependence is evaluated conditioned on all the other nodes. (middle, right) We observe that the NGR slopes match the trend in the GGM graph. This shows that the dependency plots learned comply with the desired behaviour as shown in the color of the partial correlation edges.
...and 2 more figures

Neural Graph Revealers

TL;DR

Abstract

Neural Graph Revealers

Authors

TL;DR

Abstract

Table of Contents

Figures (7)