Table of Contents
Fetching ...

Making Multi-Axis Models Robust to Multiplicative Noise: How, and Why?

Bailey Andrew, David R. Westhead, Luisa Cutillo

Abstract

In this paper we develop a graph-learning algorithm, MED-MAGMA, to fit multi-axis (Kronecker-sum-structured) models corrupted by multiplicative noise. This type of noise is natural in many application domains, such as that of single-cell RNA sequencing, in which it naturally captures technical biases of RNA sequencing platforms. Our work is evaluated against prior work on each and every public dataset in the Single Cell Expression Atlas under a certain size, demonstrating that our methodology learns networks with better local and global structure. MED-MAGMA is made available as a Python package (MED-MAGMA).

Making Multi-Axis Models Robust to Multiplicative Noise: How, and Why?

Abstract

In this paper we develop a graph-learning algorithm, MED-MAGMA, to fit multi-axis (Kronecker-sum-structured) models corrupted by multiplicative noise. This type of noise is natural in many application domains, such as that of single-cell RNA sequencing, in which it naturally captures technical biases of RNA sequencing platforms. Our work is evaluated against prior work on each and every public dataset in the Single Cell Expression Atlas under a certain size, demonstrating that our methodology learns networks with better local and global structure. MED-MAGMA is made available as a Python package (MED-MAGMA).

Paper Structure

This paper contains 26 sections, 4 theorems, 36 equations, 6 figures, 1 table.

Key Result

Proposition 1

The function $f$ described in Equation eq:med-magma-fiber has the following property: $f(\mathbf{X}) = f(\mathbf{Y})$ if and only if one can find strictly positive $\mathbf{r}_\mathrm{rows},\mathbf{r}_\mathrm{cols}$ such that we have $\mathrm{vec}\left[\mathbf{X}\right] = \left(\mathbf{r}_\mathrm{co

Figures (6)

  • Figure 1: Estimated tail dependence across 1000 randomly selected pairs of genes (left) and cells (right) from the E-MTAB-7249 scRNA-seq dataset discussed in Section \ref{['sec:med-magma-specific']}, compared to random data generated from a Gaussian copula.
  • Figure 2: (Both) Results across 20 synthetic Kronecker-sum-structured Gaussian $100 \times 150$ datasets corrupted by varying levels of multiplicative noise. Median result is highlighted. As noise strength varies, we plot the area under PR curves (left), assortativity (center), and AMI (right).
  • Figure 3: (Both) The datasets are ordered by increasing performance along the x-axis. GmGM/nonpara refers to GmGM equipped with the nonparanormal skeptic. The median performance is highlighted. (Left) Assortativities below 0 represent a statistical tendency for cells of different categories to connect, which indicates the network does capture information about the dataset, but it is harder to interpret.
  • Figure 4: (All) A UMAP plot of E-MTAB-7249 cells based on gene expression. (Left) Cell type. (Middle, Right) The percentage of gene expression from genes in modules m0 and m3, respectively.
  • Figure 5: (Both) Results across 20 synthetic Kronecker-sum-structured Gaussian $200 \times 200$ datasets corrupted by varying levels of multiplicative noise. Median result is highlighted. As noise strength varies, we plot the area under PR curves (left), assortativity (center), and AMI (right).
  • ...and 1 more figures

Theorems & Definitions (9)

  • Proposition : Goodness of $f$
  • proof
  • Proposition : Maximizing $Q$
  • proof
  • Proposition : Goodness of $f$
  • proof
  • Definition : GmGM andrew_gmgm_2024
  • Proposition : Maximizing $Q$
  • proof