Table of Contents
Fetching ...

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Sijian Fan, Liyan Xiong, Dayuan Wang, Guoshuai Cai, Ray Bai

Abstract

Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Abstract

Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.
Paper Structure (13 sections, 1 theorem, 31 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 1 theorem, 31 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Let $\widehat{\mathbf{a}}_k$ denote the global mode of the proximal operator proximal-gradient-update, and let $\widetilde{\theta}_k$ be defined as in thetak. Then where $\mathbf{z}_k = \mathbf{a}_k^{(t-1)} - \eta \nabla f(\mathbf{a}_k^{(t-1)})$, $\Delta = \inf_{\mathbf{x}} \{ \lVert \mathbf{x} \rVert_2 / 2 - \eta \cdot \text{pen}(\mathbf{x} \mid \widetilde{\theta}_k)/ \lVert \mathbf{x}_2 \}$, an

Figures (5)

  • Figure 1: Overview of the proposed BVSIMC framework
  • Figure 2: Results from the M. tb drug resistance prediction analysis. The training size is the proportion $\rho$ of observed (non-masked) entries.
  • Figure 3: Top eight functional groups selected by BVSIMC. Left panel: Heatmap of the 13 drugs vs. these functional groups. Right panel: Chemical structures of these groups.
  • Figure 4: Drug repositioning prediction results for Cdataset. The training size is the proportion $\rho$ of observed (non-masked) entries.
  • Figure 5: AUC scores under different confidence parameters $\xi$.

Theorems & Definitions (1)

  • Proposition 1