Table of Contents
Fetching ...

Multimodal Clinical Trial Outcome Prediction with Large Language Models

Wenhao Zheng, Liaoyaqi Wang, Dongshen Peng, Hongxia Xu, Yun Li, Hongtu Zhu, Tianfan Fu, Huaxiu Yao

TL;DR

The paper tackles the high cost of clinical trial failures by proposing LIFTED, a unified framework that converts diverse trial modalities into natural language descriptions and leverages noise-robust transformer encoders and sparse mixture-of-experts to identify cross-modal information patterns. The approach then dynamically fuses modality representations for final outcome prediction, achieving superior performance across Phases I–III on HINT and CTOD benchmarks. Ablation studies confirm the contributions of language-based modality transformation, representation augmentation, and expert routing to the observed gains. This work demonstrates that a unified text-based modality can effectively replace disparate encoders, enabling scalable integration of new modalities with potential for substantial cost savings in drug development.

Abstract

The clinical trial is a pivotal and costly process, often spanning multiple years and requiring substantial financial resources. Therefore, the development of clinical trial outcome prediction models aims to exclude drugs likely to fail and holds the potential for significant cost savings. Recent data-driven attempts leverage deep learning methods to integrate multimodal data for predicting clinical trial outcomes. However, these approaches rely on manually designed modal-specific encoders, which limits both the extensibility to adapt new modalities and the ability to discern similar information patterns across different modalities. To address these issues, we propose a multimodal mixture-of-experts (LIFTED) approach for clinical trial outcome prediction. Specifically, LIFTED unifies different modality data by transforming them into natural language descriptions. Then, LIFTED constructs unified noise-resilient encoders to extract information from modal-specific language descriptions. Subsequently, a sparse Mixture-of-Experts framework is employed to further refine the representations, enabling LIFTED to identify similar information patterns across different modalities and extract more consistent representations from those patterns using the same expert model. Finally, a mixture-of-experts module is further employed to dynamically integrate different modality representations for prediction, which gives LIFTED the ability to automatically weigh different modalities and pay more attention to critical information. The experiments demonstrate that LIFTED significantly enhances performance in predicting clinical trial outcomes across all three phases compared to the best baseline, showcasing the effectiveness of our proposed key components.

Multimodal Clinical Trial Outcome Prediction with Large Language Models

TL;DR

The paper tackles the high cost of clinical trial failures by proposing LIFTED, a unified framework that converts diverse trial modalities into natural language descriptions and leverages noise-robust transformer encoders and sparse mixture-of-experts to identify cross-modal information patterns. The approach then dynamically fuses modality representations for final outcome prediction, achieving superior performance across Phases I–III on HINT and CTOD benchmarks. Ablation studies confirm the contributions of language-based modality transformation, representation augmentation, and expert routing to the observed gains. This work demonstrates that a unified text-based modality can effectively replace disparate encoders, enabling scalable integration of new modalities with potential for substantial cost savings in drug development.

Abstract

The clinical trial is a pivotal and costly process, often spanning multiple years and requiring substantial financial resources. Therefore, the development of clinical trial outcome prediction models aims to exclude drugs likely to fail and holds the potential for significant cost savings. Recent data-driven attempts leverage deep learning methods to integrate multimodal data for predicting clinical trial outcomes. However, these approaches rely on manually designed modal-specific encoders, which limits both the extensibility to adapt new modalities and the ability to discern similar information patterns across different modalities. To address these issues, we propose a multimodal mixture-of-experts (LIFTED) approach for clinical trial outcome prediction. Specifically, LIFTED unifies different modality data by transforming them into natural language descriptions. Then, LIFTED constructs unified noise-resilient encoders to extract information from modal-specific language descriptions. Subsequently, a sparse Mixture-of-Experts framework is employed to further refine the representations, enabling LIFTED to identify similar information patterns across different modalities and extract more consistent representations from those patterns using the same expert model. Finally, a mixture-of-experts module is further employed to dynamically integrate different modality representations for prediction, which gives LIFTED the ability to automatically weigh different modalities and pay more attention to critical information. The experiments demonstrate that LIFTED significantly enhances performance in predicting clinical trial outcomes across all three phases compared to the best baseline, showcasing the effectiveness of our proposed key components.
Paper Structure (26 sections, 12 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 12 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of LIFTED. Step 1: Transforming multimodal data into natural language descriptions, where all modalities are converted into natural language descriptions to facilitate the representation extraction process of the transformer encoders. Step 2: Extract and combine representations from different modalities, where representations are extracted by the noise-resilient unified encoders and integrated by a Mixture-of-Experts (MoE) framework to make the final predictions.
  • Figure 2: Processes of the linearization and the prompting.
  • Figure 3: The SMoE experts' importance weights of our model predicting the knee osteoarthritis patient. Experts 6 and 7 play a crucial role in extracting common information patterns across modalities, while other experts specialize in a single specific modality.
  • Figure 4: The modality importance weights of our model predicting the type 2 diabetes mellitus patient. LIFTED pay more attention to the disease modality as expected, since type 2 diabetes mellitus is hard to cure.