Interpretable Generative and Discriminative Learning for Multimodal and Incomplete Clinical Data
Albert Belenguer-Llorens, Carlos Sevilla-Salcedo, Janaina Mourao-Miranda, Vanessa Gómez-Verdejo
TL;DR
This work introduces OSIRIS, a Bayesian model that unifies generative and discriminative learning to tackle multimodal, incomplete clinical data. By maintaining two interdependent latent spaces $\mathbf{G}$ (generative) and $\mathbf{Z}$ (discriminative), OSIRIS enables automatic missing-view imputation, uncertainty quantification, and sparse latent-factor discovery, while remaining task-focused through a Bayesian logistic link to labels. The authors develop a mean-field variational inference scheme with ARD sparsity and a Taylor-based bound to handle nonconjugate terms, yielding interpretable, compact latent representations that separate task-relevant information from auxiliary structure. Empirical results across diverse datasets (including ADNI) show that OSIRIS often outperforms state-of-the-art baselines in AUC and BACC, robustly imputes missing data, and reveals clinically meaningful biomarker patterns, underscoring its potential for practical multimodal biomarker discovery and disease prognosis.
Abstract
Real-world clinical problems are often characterized by multimodal data, usually associated with incomplete views and limited sample sizes in their cohorts, posing significant limitations for machine learning algorithms. In this work, we propose a Bayesian approach designed to efficiently handle these challenges while providing interpretable solutions. Our approach integrates (1) a generative formulation to capture cross-view relationships with a semi-supervised strategy, and (2) a discriminative task-oriented formulation to identify relevant information for specific downstream objectives. This dual generative-discriminative formulation offers both general understanding and task-specific insights; thus, it provides an automatic imputation of the missing views while enabling robust inference across different data sources. The potential of this approach becomes evident when applied to the multimodal clinical data, where our algorithm is able to capture and disentangle the complex interactions among biological, psychological, and sociodemographic modalities.
