Table of Contents
Fetching ...

Supervised Bayesian joint graphical model for simultaneous network estimation and subgroup identification

Xing Qin, Xu Liu, Shuangge Ma, Mengyun Wu

TL;DR

A novel supervised Bayesian graphical model is developed for jointly identifying multiple heterogeneous networks and subgroups in cancer, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification.

Abstract

Heterogeneity is a fundamental characteristic of cancer. To accommodate heterogeneity, subgroup identification has been extensively studied and broadly categorized into unsupervised and supervised analysis. Compared to unsupervised analysis, supervised approaches potentially hold greater clinical implications. Under the unsupervised analysis framework, several methods focusing on network-based subgroup identification have been developed, offering more comprehensive insights than those restricted to mean, variance, and other simplistic distributions by incorporating the interconnections among variables. However, research on supervised network-based subgroup identification remains limited. In this study, we develop a novel supervised Bayesian graphical model (SBJGM) for jointly identifying multiple heterogeneous networks and subgroups. In the proposed model, heterogeneity is not only reflected in molecular data but also associated with a clinical outcome, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification. The consistency properties of the estimates are rigorously established, and an efficient algorithm is developed. Extensive simulation studies and a real-world application to The Cancer Genome Atlas (TCGA) data are conducted, which demonstrate the advantages of the proposed approach in terms of both subgroup and network identification.

Supervised Bayesian joint graphical model for simultaneous network estimation and subgroup identification

TL;DR

A novel supervised Bayesian graphical model is developed for jointly identifying multiple heterogeneous networks and subgroups in cancer, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification.

Abstract

Heterogeneity is a fundamental characteristic of cancer. To accommodate heterogeneity, subgroup identification has been extensively studied and broadly categorized into unsupervised and supervised analysis. Compared to unsupervised analysis, supervised approaches potentially hold greater clinical implications. Under the unsupervised analysis framework, several methods focusing on network-based subgroup identification have been developed, offering more comprehensive insights than those restricted to mean, variance, and other simplistic distributions by incorporating the interconnections among variables. However, research on supervised network-based subgroup identification remains limited. In this study, we develop a novel supervised Bayesian graphical model (SBJGM) for jointly identifying multiple heterogeneous networks and subgroups. In the proposed model, heterogeneity is not only reflected in molecular data but also associated with a clinical outcome, and a novel similarity prior is introduced to effectively accommodate similarities among the networks of different subgroups, significantly facilitating clinically meaningful biological network construction and subgroup identification. The consistency properties of the estimates are rigorously established, and an efficient algorithm is developed. Extensive simulation studies and a real-world application to The Cancer Genome Atlas (TCGA) data are conducted, which demonstrate the advantages of the proposed approach in terms of both subgroup and network identification.
Paper Structure (10 sections, 1 theorem, 8 equations, 3 figures, 1 table)

This paper contains 10 sections, 1 theorem, 8 equations, 3 figures, 1 table.

Key Result

Theorem 1

Let $\tilde{s}= d_{\boldsymbol{{\beta}}} \sqrt{ d_{\boldsymbol{{\mu}}}}+ d_{\boldsymbol{{\beta}}}^2+\sqrt{ p}+\sqrt{ {s+p/K}}$. Assume that $\tilde{s}\sqrt{ { K^3\log p} }=o(\sqrt{n})$. Then, 1. Under Assumptions 1-5 (supplementary materials), the non-asymptotic bound of the estimation error of $\ 2. Under Assumptions 1-6 (supplementary materials), the thresholded estimators $\tilde{\boldsymbol{

Figures (3)

  • Figure 1: Simulation results for the power-law network under the scenarios with $K=2$ based on 100 replicates.
  • Figure 2: Simulation results for the Erdös-Rényi network under the scenarios with $K=2$ based on 100 replicates.
  • Figure 3: Data analysis: gene networks for the two subgroups identified by the proposed approach. In each network, the highlighted edges are shared by the two subgroups.

Theorems & Definitions (1)

  • Theorem 1