Table of Contents
Fetching ...

P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data

Yucheng Ruan, Xiang Lan, Daniel J. Tan, Hairil Rizal Abdullah, Mengling Feng

TL;DR

The paper tackles predicting outcomes from medical tabular data by integrating unstructured text and structured features through a prompt-based multimodal transformer. PTransformer comprises a tabular cell embedding generator that converts each cell into a textual prompt and encodes it with a frozen pre-trained sentence encoder, plus a tabular transformer that fuses cell embeddings into patient representations for predictions. Evaluated on PASA and MIMIC-III across surgical duration, LOS, and mortality tasks, it achieves notable gains over state-of-the-art baselines, including reductions in RMSE/MAE and improvements in BACC/AUROC. The results demonstrate the value of medical prompts and text-informed embeddings for medical tabular prediction, with future work exploring soft prompts for task-specific adaptation.

Abstract

Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remaining for existing work to be effectively adapted into medical domain, such as ignoring unstructured free-texts and underutilizing the textual information in structured data. To address these issues, we propose PTransformer, a \underline{P}rompt-based multimodal \underline{Transformer} architecture designed specifically for medical tabular data. This framework consists of two critical components: a tabular cell embedding generator and a tabular transformer. The former efficiently encodes diverse modalities from both structured and unstructured tabular data into a harmonized language semantic space with the help of pre-trained sentence encoder and medical prompts. The latter integrates cell representations to generate patient embeddings for various medical tasks. In comprehensive experiments on two real-world datasets for three medical tasks, PTransformer demonstrated the improvements with 10.9%/11.0% on RMSE/MAE, 0.5%/2.2% on RMSE/MAE, and 1.6%/0.8% on BACC/AUROC compared to state-of-the-art (SOTA) baselines in predictability.

P-Transformer: A Prompt-based Multimodal Transformer Architecture For Medical Tabular Data

TL;DR

The paper tackles predicting outcomes from medical tabular data by integrating unstructured text and structured features through a prompt-based multimodal transformer. PTransformer comprises a tabular cell embedding generator that converts each cell into a textual prompt and encodes it with a frozen pre-trained sentence encoder, plus a tabular transformer that fuses cell embeddings into patient representations for predictions. Evaluated on PASA and MIMIC-III across surgical duration, LOS, and mortality tasks, it achieves notable gains over state-of-the-art baselines, including reductions in RMSE/MAE and improvements in BACC/AUROC. The results demonstrate the value of medical prompts and text-informed embeddings for medical tabular prediction, with future work exploring soft prompts for task-specific adaptation.

Abstract

Medical tabular data, abundant in Electronic Health Records (EHRs), is a valuable resource for diverse medical tasks such as risk prediction. While deep learning approaches, particularly transformer-based models, have shown remarkable performance in tabular data prediction, there are still problems remaining for existing work to be effectively adapted into medical domain, such as ignoring unstructured free-texts and underutilizing the textual information in structured data. To address these issues, we propose PTransformer, a \underline{P}rompt-based multimodal \underline{Transformer} architecture designed specifically for medical tabular data. This framework consists of two critical components: a tabular cell embedding generator and a tabular transformer. The former efficiently encodes diverse modalities from both structured and unstructured tabular data into a harmonized language semantic space with the help of pre-trained sentence encoder and medical prompts. The latter integrates cell representations to generate patient embeddings for various medical tasks. In comprehensive experiments on two real-world datasets for three medical tasks, PTransformer demonstrated the improvements with 10.9%/11.0% on RMSE/MAE, 0.5%/2.2% on RMSE/MAE, and 1.6%/0.8% on BACC/AUROC compared to state-of-the-art (SOTA) baselines in predictability.
Paper Structure (21 sections, 2 equations, 3 figures, 2 tables)

This paper contains 21 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The comparison between existing work and our proposed model. Main modules of the proposed framework: (1) Tabular cell embedding generator, (2) Tabular transformer, (3) Prediction head.
  • Figure 2: Overview of tabular cell embedding generator.
  • Figure 3: Overview of tabular transformer and prediction head.