BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Sijian Fan; Liyan Xiong; Dayuan Wang; Guoshuai Cai; Ray Bai

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Sijian Fan, Liyan Xiong, Dayuan Wang, Guoshuai Cai, Ray Bai

Abstract

Recent advances in drug discovery have demonstrated that incorporating side information (e.g., chemical properties about drugs and genomic information about diseases) often greatly improves prediction performance. However, these side features can vary widely in relevance and are often noisy and high-dimensional. We propose Bayesian Variable Selection-Guided Inductive Matrix Completion (BVSIMC), a new Bayesian model that enables variable selection from side features in drug discovery. By learning sparse latent embeddings, BVSIMC improves both predictive accuracy and interpretability. We validate our method through simulation studies and two drug discovery applications: 1) prediction of drug resistance in Mycobacterium tuberculosis, and 2) prediction of new drug-disease associations in computational drug repositioning. On both synthetic and real data, BVSIMC outperforms several other state-of-the-art methods in terms of prediction. In our two real examples, BVSIMC further reveals the most clinically meaningful side features.

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Abstract

Paper Structure (13 sections, 1 theorem, 31 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 1 theorem, 31 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Methodology
Bayesian IMC
BVSIMC Prior Formulation
Implementation
Results
Simulation Analysis
Predicting Mycobacterium Tuberculosis Drug Resistance
Predicting Drug-Target Associations in Drug Repositioning
Conclusion
Details and Derivations for the BVSIMC algorithm
Updates for the Latent Factor Matrices
Updates for the Sparsity Parameters

Key Result

Proposition 1

Let $\widehat{\mathbf{a}}_k$ denote the global mode of the proximal operator proximal-gradient-update, and let $\widetilde{\theta}_k$ be defined as in thetak. Then where $\mathbf{z}_k = \mathbf{a}_k^{(t-1)} - \eta \nabla f(\mathbf{a}_k^{(t-1)})$, $\Delta = \inf_{\mathbf{x}} \{ \lVert \mathbf{x} \rVert_2 / 2 - \eta \cdot \text{pen}(\mathbf{x} \mid \widetilde{\theta}_k)/ \lVert \mathbf{x}_2 \}$, an

Figures (5)

Figure 1: Overview of the proposed BVSIMC framework
Figure 2: Results from the M. tb drug resistance prediction analysis. The training size is the proportion $\rho$ of observed (non-masked) entries.
Figure 3: Top eight functional groups selected by BVSIMC. Left panel: Heatmap of the 13 drugs vs. these functional groups. Right panel: Chemical structures of these groups.
Figure 4: Drug repositioning prediction results for Cdataset. The training size is the proportion $\rho$ of observed (non-masked) entries.
Figure 5: AUC scores under different confidence parameters $\xi$.

Theorems & Definitions (1)

Proposition 1

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Abstract

BVSIMC: Bayesian Variable Selection-Guided Inductive Matrix Completion for Improved and Interpretable Drug Discovery

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)