Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis

Haonan Zhu; Andre R. Goncalves; Camilo Valdes; Hiranmayi Ranganathan; Boya Zhang; Jose Manuel Martí; Car Reen Kok; Monica K. Borucki; Nisha J. Mulakken; James B. Thissen; Crystal Jaing; Alfred Hero; Nicholas A. Be

Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis

Haonan Zhu, Andre R. Goncalves, Camilo Valdes, Hiranmayi Ranganathan, Boya Zhang, Jose Manuel Martí, Car Reen Kok, Monica K. Borucki, Nisha J. Mulakken, James B. Thissen, Crystal Jaing, Alfred Hero, Nicholas A. Be

TL;DR

The paper tackles binary health-state prediction from high-dimensional microbiome data pooled across multiple studies by introducing a hierarchical Bayesian multitask logistic regression with a shared sparsity prior. It derives scalable variational inference using a mean-field approximation and coordinate ascent updates to approximate the intractable posterior, incorporating a Bernoulli-Gaussian sparsity pattern $z_j\sim\mathrm{Bernoulli}(\theta)$, $\theta\sim\mathrm{Beta}(\alpha_0,\beta_0)$, and task-wide weight covariance $\boldsymbol\Sigma_0$ with $\boldsymbol\Sigma_0^{-1}\sim\mathrm{Wishart}(v_0,\boldsymbol V_0)$. Through synthetic and real microbiome experiments, the approach achieves strong support recovery under shared sparsity and provides well-calibrated predictions with uncertainty quantification, even amid heterogeneous pooled data. The results highlight robustness to cross-study heterogeneity and offer interpretable insights by identifying informative microbial taxa across diseases, with potential for improved biomarker discovery and clinical decision support.

Abstract

This paper proposes a hierarchical Bayesian multitask learning model that is applicable to the general multi-task binary classification learning problem where the model assumes a shared sparsity structure across different tasks. We derive a computationally efficient inference algorithm based on variational inference to approximate the posterior distribution. We demonstrate the potential of the new approach on various synthetic datasets and for predicting human health status based on microbiome profile. Our analysis incorporates data pooled from multiple microbiome studies, along with a comprehensive comparison with other benchmark methods. Results in synthetic datasets show that the proposed approach has superior support recovery property when the underlying regression coefficients share a common sparsity structure across different tasks. Our experiments on microbiome classification demonstrate the utility of the method in extracting informative taxa while providing well-calibrated predictions with uncertainty quantification and achieving competitive performance in terms of prediction metrics. Notably, despite the heterogeneity of the pooled datasets (e.g., different experimental objectives, laboratory setups, sequencing equipment, patient demographics), our method delivers robust results.

Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis

TL;DR

Abstract

Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)