Table of Contents
Fetching ...

FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction

Muhao Xu, Zhenfeng Zhu, Youru Li, Shuai Zheng, Yawei Zhao, Kunlun He, Yao Zhao

TL;DR

FlexCare tackles the challenge of predicting multiple healthcare outcomes from incomplete multimodal EHR data by introducing a unified multitask framework that operates as asynchronous single-task predictors. It combines a task-agnostic multimodal information extractor with a task-guided hierarchical fusion built on a task/modality-aware Mixture of Experts, plus task-specific prediction heads. The approach uses a token-based scheme with covariance regularization to decorrelate modality-combination representations and enable effective cross-task synergy. Empirical results on MIMIC-IV, MIMIC-CXR, and MIMIC-NOTE demonstrate competitive performance across six tasks and illustrate the model’s extensibility to new tasks with limited data.

Abstract

Multimodal electronic health record (EHR) data can offer a holistic assessment of a patient's health status, supporting various predictive healthcare tasks. Recently, several studies have embraced the multitask learning approach in the healthcare domain, exploiting the inherent correlations among clinical tasks to predict multiple outcomes simultaneously. However, existing methods necessitate samples to possess complete labels for all tasks, which places heavy demands on the data and restricts the flexibility of the model. Meanwhile, within a multitask framework with multimodal inputs, how to comprehensively consider the information disparity among modalities and among tasks still remains a challenging problem. To tackle these issues, a unified healthcare prediction model, also named by \textbf{FlexCare}, is proposed to flexibly accommodate incomplete multimodal inputs, promoting the adaption to multiple healthcare tasks. The proposed model breaks the conventional paradigm of parallel multitask prediction by decomposing it into a series of asynchronous single-task prediction. Specifically, a task-agnostic multimodal information extraction module is presented to capture decorrelated representations of diverse intra- and inter-modality patterns. Taking full account of the information disparities between different modalities and different tasks, we present a task-guided hierarchical multimodal fusion module that integrates the refined modality-level representations into an individual patient-level representation. Experimental results on multiple tasks from MIMIC-IV/MIMIC-CXR/MIMIC-NOTE datasets demonstrate the effectiveness of the proposed method. Additionally, further analysis underscores the feasibility and potential of employing such a multitask strategy in the healthcare domain. The source code is available at https://github.com/mhxu1998/FlexCare.

FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction

TL;DR

FlexCare tackles the challenge of predicting multiple healthcare outcomes from incomplete multimodal EHR data by introducing a unified multitask framework that operates as asynchronous single-task predictors. It combines a task-agnostic multimodal information extractor with a task-guided hierarchical fusion built on a task/modality-aware Mixture of Experts, plus task-specific prediction heads. The approach uses a token-based scheme with covariance regularization to decorrelate modality-combination representations and enable effective cross-task synergy. Empirical results on MIMIC-IV, MIMIC-CXR, and MIMIC-NOTE demonstrate competitive performance across six tasks and illustrate the model’s extensibility to new tasks with limited data.

Abstract

Multimodal electronic health record (EHR) data can offer a holistic assessment of a patient's health status, supporting various predictive healthcare tasks. Recently, several studies have embraced the multitask learning approach in the healthcare domain, exploiting the inherent correlations among clinical tasks to predict multiple outcomes simultaneously. However, existing methods necessitate samples to possess complete labels for all tasks, which places heavy demands on the data and restricts the flexibility of the model. Meanwhile, within a multitask framework with multimodal inputs, how to comprehensively consider the information disparity among modalities and among tasks still remains a challenging problem. To tackle these issues, a unified healthcare prediction model, also named by \textbf{FlexCare}, is proposed to flexibly accommodate incomplete multimodal inputs, promoting the adaption to multiple healthcare tasks. The proposed model breaks the conventional paradigm of parallel multitask prediction by decomposing it into a series of asynchronous single-task prediction. Specifically, a task-agnostic multimodal information extraction module is presented to capture decorrelated representations of diverse intra- and inter-modality patterns. Taking full account of the information disparities between different modalities and different tasks, we present a task-guided hierarchical multimodal fusion module that integrates the refined modality-level representations into an individual patient-level representation. Experimental results on multiple tasks from MIMIC-IV/MIMIC-CXR/MIMIC-NOTE datasets demonstrate the effectiveness of the proposed method. Additionally, further analysis underscores the feasibility and potential of employing such a multitask strategy in the healthcare domain. The source code is available at https://github.com/mhxu1998/FlexCare.
Paper Structure (32 sections, 13 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 13 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Illustration of the multimodal data and the multitask predictions during a patient's admission; (b) The number of samples from multiple tasks that depend on time-series data in the MIMIC-IV dataset; (c) The number of samples with different modality data in the MIMIC-IV dataset.
  • Figure 2: (a) Single-task model; (b) Conventional multi-task model; (c) Our proposed flexible multi-task model.
  • Figure 3: The framework of the FlexCare model. It consists of three modules: (a) Task-agnostic multimodal information extraction; (b) Task-guided hierarchical multimodal fusion; (c) Task-specific prediction heads.
  • Figure 4: Results of ablation study.
  • Figure 5: Visualization of patient-level representation learned w/o and w/ the task token.
  • ...and 2 more figures