AdapDISCOM: An Adaptive Sparse Regression Method for High-Dimensional Multimodal Data With Block-Wise Missingness and Measurement Errors
Maimouna Baldé, Abdoul O. Diakité, Claudia Moreau, Gleb Bezgin, Nikhil Bhagwat, Pedro Rosa-Neto, Jean-Baptiste Poline, Simon Girard, Amadou Barry
TL;DR
AdapDISCOM presents an adaptive sparse regression framework that jointly addresses block-wise missingness and additive measurement error in high-dimensional multimodal data. By deriving modality-specific covariance weighting and providing robust (Huber) and fast variants, it achieves superior prediction and reliable biomarker selection across heterogeneous settings, as demonstrated in extensive simulations and ADNI data. The work offers strong theoretical guarantees, including convergence, model selection consistency, and efficient prediction, while delivering scalable software for practical use in biomedical research. This approach enhances the reliability of multimodal analyses in the presence of realistic data imperfections, enabling more accurate inference and biomarker discovery.
Abstract
Multimodal high-dimensional data are increasingly prevalent in biomedical research, yet they are often compromised by block-wise missingness and measurement errors, posing significant challenges for statistical inference and prediction. We propose AdapDISCOM, a novel adaptive direct sparse regression method that simultaneously addresses these two pervasive issues. Building on the DISCOM framework, AdapDISCOM introduces modality-specific weighting schemes to account for heterogeneity in data structures and error magnitudes across modalities. We establish the theoretical properties of AdapDISCOM, including model selection consistency and convergence rates under sub-Gaussian and heavy-tailed settings, and develop robust and computationally efficient variants (AdapDISCOM-Huber and Fast-AdapDISCOM). Extensive simulations demonstrate that AdapDISCOM consistently outperforms existing methods such as DISCOM, SCOM, and CoCoLasso, particularly under heterogeneous contamination and heavy-tailed distributions. Finally, we apply AdapDISCOM to Alzheimers Disease Neuroimaging Initiative (ADNI) data, demonstrating improved prediction of cognitive scores and reliable selection of established biomarkers, even with substantial missingness and measurement errors. AdapDISCOM provides a flexible, robust, and scalable framework for high-dimensional multimodal data analysis under realistic data imperfections.
