Table of Contents
Fetching ...

Learning from Limited and Incomplete Data: A Multimodal Framework for Predicting Pathological Response in NSCLC

Alice Natalina Caragliano, Giulia Farina, Fatih Aksu, Camillo Maria Caruso, Claudia Tacconi, Carlo Greco, Lorenzo Nibid, Edy Ippolito, Michele Fiore, Giuseppe Perrone, Sara Ramella, Paolo Soda, Valerio Guarrasi

Abstract

Major pathological response (pR) following neoadjuvant therapy is a clinically meaningful endpoint in non-small cell lung cancer, strongly associated with improved survival. However, accurate preoperative prediction of pR remains challenging, particularly in real-world clinical settings characterized by limited data availability and incomplete clinical profiles. In this study, we propose a multimodal deep learning framework designed to address these constraints by integrating foundation model-based CT feature extraction with a missing-aware architecture for clinical variables. This approach enables robust learning from small cohorts while explicitly modeling missing clinical information, without relying on conventional imputation strategies. A weighted fusion mechanism is employed to leverage the complementary contributions of imaging and clinical modalities, yielding a multimodal model that consistently outperforms both unimodal imaging and clinical baselines. These findings underscore the added value of integrating heterogeneous data sources and highlight the potential of multimodal, missing-aware systems to support pR prediction under realistic clinical conditions.

Learning from Limited and Incomplete Data: A Multimodal Framework for Predicting Pathological Response in NSCLC

Abstract

Major pathological response (pR) following neoadjuvant therapy is a clinically meaningful endpoint in non-small cell lung cancer, strongly associated with improved survival. However, accurate preoperative prediction of pR remains challenging, particularly in real-world clinical settings characterized by limited data availability and incomplete clinical profiles. In this study, we propose a multimodal deep learning framework designed to address these constraints by integrating foundation model-based CT feature extraction with a missing-aware architecture for clinical variables. This approach enables robust learning from small cohorts while explicitly modeling missing clinical information, without relying on conventional imputation strategies. A weighted fusion mechanism is employed to leverage the complementary contributions of imaging and clinical modalities, yielding a multimodal model that consistently outperforms both unimodal imaging and clinical baselines. These findings underscore the added value of integrating heterogeneous data sources and highlight the potential of multimodal, missing-aware systems to support pR prediction under realistic clinical conditions.
Paper Structure (23 sections, 6 equations, 2 figures, 1 table)

This paper contains 23 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the proposed multimodal framework, organized into three main stages: feature extraction, model training, and multimodal fusion. Imaging and clinical modalities are processed through two parallel branches, and the resulting unimodal outputs are subsequently fused to predict pR.
  • Figure 2: Effect of fusion weight $\alpha$ on model performance. Star markers indicate the unimodal imaging model ($\alpha=0$) and the unimodal missing-aware clinical model ($\alpha=1$). The highest performance is achieved at $\alpha=0.7$, underscoring the complementary contribution of clinical and imaging modalities.