Table of Contents
Fetching ...

GmGM: a Fast Multi-Axis Gaussian Graphical Model

Bailey Andrew, David Westhead, Luisa Cutillo

TL;DR

The Gaussian multi-Graphical Model is introduced, a model to construct sparse graph representations of matrix- and tensor-variate data that uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case.

Abstract

This paper introduces the Gaussian multi-Graphical Model, a model to construct sparse graph representations of matrix- and tensor-variate data. We generalize prior work in this area by simultaneously learning this representation across several tensors that share axes, which is necessary to allow the analysis of multimodal datasets such as those encountered in multi-omics. Our algorithm uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case. This allows the use of our methodology on large multi-modal datasets such as single-cell multi-omics data, which was challenging with previous approaches. We validate our model on synthetic data and five real-world datasets.

GmGM: a Fast Multi-Axis Gaussian Graphical Model

TL;DR

The Gaussian multi-Graphical Model is introduced, a model to construct sparse graph representations of matrix- and tensor-variate data that uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case.

Abstract

This paper introduces the Gaussian multi-Graphical Model, a model to construct sparse graph representations of matrix- and tensor-variate data. We generalize prior work in this area by simultaneously learning this representation across several tensors that share axes, which is necessary to allow the analysis of multimodal datasets such as those encountered in multi-omics. Our algorithm uses only a single eigendecomposition per axis, achieving an order of magnitude speedup over prior work in the ungeneralized case. This allows the use of our methodology on large multi-modal datasets such as single-cell multi-omics data, which was challenging with previous approaches. We validate our model on synthetic data and five real-world datasets.
Paper Structure (14 sections, 3 theorems, 8 equations, 8 figures, 1 table)

This paper contains 14 sections, 3 theorems, 8 equations, 8 figures, 1 table.

Key Result

Theorem 1

Let $\mathbf{V}_\ell\mathrm{diag}\left[\mathbf{e}_\ell\right]\mathbf{V}_\ell^T$ be the eigendecomposition of $\mathbf{S}_\ell$ (where $\mathbf{V}_\ell \in \mathbb{R}^{d_\ell \times d_\ell}$ and $\mathrm{diag}\left[\mathbf{e}_\ell\right] \in \mathbb{R}^{d_\ell \times d_\ell}$ is a diagonal matrix wit

Figures (8)

  • Figure 1: The two matrices of the LifeLines-DEEP dataset. As both matrices include data for the same people, the learned graph between people should be the same.
  • Figure 2: A graphical overview of how the GmGM algorithm works. We use $\gamma$ to represent an arbitrary modality, and $\ell$ to represent an arbitrary axis. Proofs are given in the supplementary material.
  • Figure 3: A comparison of the runtimes of our algorithm against (a) bi-graphical, (b) 3-axis, and (c) 4-axis prior work.
  • Figure 4: Precision-recall curves comparing various algorithms on synthetic 50x50 data, averaged over multiple runs. For (a) and (c), true graphs have each edge independently having a 2% chance of existing. For (b), true graphs are generated from an AR(1) process. Shaded background represents standard deviation. (a) Two 50x50 matrices with a shared axis. EiGLasso and 'Unimodal GmGM' only consider one of the matrices. (b) A single 50x50 matrix. (c) A single 50x50x50 tensor. With the restricted L1 penalty, our algorithm performs nearly perfectly in the tensor-variate case.
  • Figure 5: (a) Assortativity as we increase the number of edges in the final graph. EiGLasso data was generated by calculating the assortativity-per-regularization penalty, and calculating the edges-per-cell from this data afterwards; this is the cause for the curve's strange behavior around 20 edges per cell. (b) The COIL-20 duck, in which the rows, columns and frames are shuffled. (c) GmGM does an almost perfect job at recovering the duck; unfortunately, it gets cut in half. (d) Predicted precision matrix for echocardiogram frames (yellow=connected, blue=disconnected). The periodic structure of the heartbeat is evident in the diagonals.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2: GmGM Estimator with Priors
  • Theorem 3