Multimodal Learning for Scalable Representation of High-Dimensional Medical Data
Authors
Areej Alsaafin, Abubakr Shafique, Saghir Alfasly, Krishna R. Kalari, H. R. Tizhoosh
Abstract
Integrating artificial intelligence (AI) with healthcare data is rapidly transforming medical diagnostics and driving progress toward precision medicine. However, effectively leveraging multimodal data, particularly digital pathology whole slide images (WSIs) and genomic sequencing, remains a significant challenge due to the intrinsic heterogeneity of these modalities and the need for scalable and interpretable frameworks. Existing diagnostic models typically operate on unimodal data, overlooking critical cross-modal interactions that can yield richer clinical insights. We introduce MarbliX (Multimodal Association and Retrieval with Binary Latent Indexed matriX), a self-supervised framework that learns to embed WSIs and immunogenomic profiles into compact, scalable binary codes, termed ``monogram.'' By optimizing a triplet contrastive objective across modalities, MarbliX captures high-resolution patient similarity in a unified latent space, enabling efficient retrieval of clinically relevant cases and facilitating case-based reasoning. \textcolor{black}{In lung cancer, MarbliX achieves 85-89\% across all evaluation metrics, outperforming histopathology (69-71\%) and immunogenomics (73-76\%). In kidney cancer, real-valued monograms yield the strongest performance (F1: 80-83\%, Accuracy: 87-90\%), with binary monograms slightly lower (F1: 78-82\%).