Multinomial belief networks for healthcare data

H. C. Donker; D. Neijzen; J. de Jong; G. A. Lunter

Multinomial belief networks for healthcare data

H. C. Donker, D. Neijzen, J. de Jong, G. A. Lunter

TL;DR

This work introduces a multinomial belief network (MBN), a deep Bayesian model for healthcare data that handles sparse, high-mimension, and incomplete count data while providing uncertainty quantification. Building on the Poisson gamma belief network (PGBN), the MBN uses multinomial observables and Dirichlet activations, enabling a deep, interpretable representation with layer-wise augmentation via Dirichlet–multinomial–CRT factorization. The authors develop a collapsed Gibbs sampler that propagates information up and down the network, achieving posterior updates for latent weights, activations, and dispersion via conjugate relationships and augmentation identities. Demonstrations on handwritten digits and a large cancer mutational dataset show that the MBN discovers coherent hierarchical structures and biologically meaningful mutational signatures, with superior held-out perplexity compared to nonnegative matrix factorization baselines and robust, data-driven interpretation. The approach enables principled deconvolution of heterogeneous healthcare data and provides uncertainty estimates essential for clinical decision-making, though scaling to very large datasets remains a challenge to be addressed with future approximate or hybrid inference methods.

Abstract

Healthcare data from patient or population cohorts are often characterized by sparsity, high missingness and relatively small sample sizes. In addition, being able to quantify uncertainty is often important in a medical context. To address these analytical requirements we propose a deep generative Bayesian model for multinomial count data. We develop a collapsed Gibbs sampling procedure that takes advantage of a series of augmentation relations, inspired by the Zhou$\unicode{x2013}$Cong$\unicode{x2013}$Chen model. We visualise the model's ability to identify coherent substructures in the data using a dataset of handwritten digits. We then apply it to a large experimental dataset of DNA mutations in cancer and show that we can identify biologically meaningful clusters of mutational signatures in a fully data-driven way.

Multinomial belief networks for healthcare data

TL;DR

Abstract

Cong

Chen model. We visualise the model's ability to identify coherent substructures in the data using a dataset of handwritten digits. We then apply it to a large experimental dataset of DNA mutations in cancer and show that we can identify biologically meaningful clusters of mutational signatures in a fully data-driven way.

Paper Structure (33 sections, 1 theorem, 57 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 1 theorem, 57 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Poisson gamma belief network
Generative model
Deep Poisson representation
Inference
Multinomial belief network
Generative model
Deep multinomial representation
Inference
Experiments
Performance evaluation
UCI ML handwritten digits
Mutational signature attribution
Discussion & Conclusion
Limitations
...and 18 more sections

Key Result

Theorem 1

The joint distributions over $n$, $\{x_k\}$ and $\{m_k\}$ below are identical:

Figures (10)

Figure 1: Schematic representation of the two belief networks. Red nodes are observations, blue dashed circles are latent hidden units, and edges are latent weights.
Figure 2: Two equivalent generative models for a count variable $\pmb{x}^{(1)}$ from the Poisson gamma belief network, using ( a) a tower of real-valued latent variables $\pmb{\theta}$, $\pmb{a}$, or ( b) latent counts $\pmb{m}$, $\pmb{y}$, $\pmb{x}$. Blunt arrows indicate deterministic relationships. The variable $\pmb{q}^{(1)}$ is a dummy and has a fixed value $1$. In representation ( a) the grayed-out counts $\pmb{x}^{(t)}$ and variables $\pmb{q}^{(t)}$, $t>1$, are included for clarity (and have the same distribution as the variables in the right model) but are not used to generate the outcome $\pmb{x}^{(1)}$, and so can be marginalized out.
Figure 3: Procedure to Gibbs sample weights $\pmb{\phi}^{(t)}$, hidden units $\pmb{\theta}^{(t)}$, concentration parameters $c^{(t)}$, and activations $\pmb{r}$ of an MBN given training data $x_{vj}^{(1)}$.
Figure 4: Hierarchy of topics learned by a three-layer MBN (with $[K_1, K_2, K_3] = [30, 20, 10]$ latent components) after training on the Optical Recognition of Handwritten Digits dataset from the UC Irvine Machine Learning Repository ALPA98. Topics are represented by their projection $\prod_{l=1}^{t}\pmb{\phi}^{(l)}$ onto the pixels. Separate panels refer to individual Markov chains that were run in parallel.
Figure 5: Posterior of four meta-mutational signatures $\phi^{(2)}_{vk}$ (labelled $k=\mathrm{M}_1, \dots, \mathrm{M}_4$) in terms of COSMIC v3.3 mutational signatures $v=\mathrm{SBS}1, \dots, \mathrm{SBS}94$ (left column) and its projection $\sum_{v=\mathrm{SBS}1}^{\mathrm{SBS}94} \phi^{(1)}_{lv} \phi^{(2)}_{vk}$ onto tri-nucleotide single base substitutions $l$ (right column). Bars indicate the average and 95% quantile range of the posterior samples. On the left, mutational signatures exceeding three times the uniform probability have been marked in bold red.
...and 5 more figures

Theorems & Definitions (1)

Theorem 1

Multinomial belief networks for healthcare data

TL;DR

Abstract

Multinomial belief networks for healthcare data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (1)