Table of Contents
Fetching ...

On Preserving the Knowledge of Long Clinical Texts

Mohammad Junayed Hasan, Suhra Noor, Mohammad Ashrafuzzaman Khan

TL;DR

This work tackles the fixed-length input constraint of transformer encoders in processing long clinical notes by proposing an integrated framework that aggregates predictions across text chunks and fuses multiple pre-trained clinical BERT-based encoders. Using MIMIC-III admission notes for mortality and length-of-stay prediction, the approach demonstrates that text aggregation generally outperforms individual baselines and that combining aggregation with ensemble methods yields the best overall performance. The main practical insight is that overlapping chunks and model fusion help preserve critical information across long documents, improving robustness and predictive accuracy in clinical settings. The study also discusses limitations, notably computational overhead, and suggests future directions like model distillation and targeted clinical-domain pre-training for summarization tasks.

Abstract

Clinical texts, such as admission notes, discharge summaries, and progress notes, contain rich and valuable information that can be used for clinical decision making. However, a severe bottleneck in using transformer encoders for processing clinical texts comes from the input length limit of these models: transformer-based encoders use fixed-length inputs. Therefore, these models discard part of the inputs while processing medical text. There is a risk of losing vital knowledge from clinical text if only part of it is processed. This paper proposes a novel method to preserve the knowledge of long clinical texts in the models using aggregated ensembles of transformer encoders. Previous studies used either ensemble or aggregation, but we studied the effects of fusing these methods. We trained several pre-trained BERT-like transformer encoders on two clinical outcome tasks: mortality prediction and length of stay prediction. Our method achieved better results than all baseline models for prediction tasks on long clinical notes. We conducted extensive experiments on the MIMIC-III clinical database's admission notes by combining multiple unstructured and high-dimensional datasets, demonstrating our method's effectiveness and superiority over existing approaches. This study shows that fusing ensemble and aggregation improves the model performance for clinical prediction tasks, particularly the mortality and the length of hospital stay.

On Preserving the Knowledge of Long Clinical Texts

TL;DR

This work tackles the fixed-length input constraint of transformer encoders in processing long clinical notes by proposing an integrated framework that aggregates predictions across text chunks and fuses multiple pre-trained clinical BERT-based encoders. Using MIMIC-III admission notes for mortality and length-of-stay prediction, the approach demonstrates that text aggregation generally outperforms individual baselines and that combining aggregation with ensemble methods yields the best overall performance. The main practical insight is that overlapping chunks and model fusion help preserve critical information across long documents, improving robustness and predictive accuracy in clinical settings. The study also discusses limitations, notably computational overhead, and suggests future directions like model distillation and targeted clinical-domain pre-training for summarization tasks.

Abstract

Clinical texts, such as admission notes, discharge summaries, and progress notes, contain rich and valuable information that can be used for clinical decision making. However, a severe bottleneck in using transformer encoders for processing clinical texts comes from the input length limit of these models: transformer-based encoders use fixed-length inputs. Therefore, these models discard part of the inputs while processing medical text. There is a risk of losing vital knowledge from clinical text if only part of it is processed. This paper proposes a novel method to preserve the knowledge of long clinical texts in the models using aggregated ensembles of transformer encoders. Previous studies used either ensemble or aggregation, but we studied the effects of fusing these methods. We trained several pre-trained BERT-like transformer encoders on two clinical outcome tasks: mortality prediction and length of stay prediction. Our method achieved better results than all baseline models for prediction tasks on long clinical notes. We conducted extensive experiments on the MIMIC-III clinical database's admission notes by combining multiple unstructured and high-dimensional datasets, demonstrating our method's effectiveness and superiority over existing approaches. This study shows that fusing ensemble and aggregation improves the model performance for clinical prediction tasks, particularly the mortality and the length of hospital stay.
Paper Structure (27 sections, 6 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 6 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Average Performance Improvement from baseline approach to ensemble + aggregation approach as followed in our methodology in macro averaged % AUROC for the clinical tasks Mortality Prediction and Length of Stay Prediction using the admission notes from MIMIC-III clinical database with context constrained large language models like BERT and its derivatives.
  • Figure 2: (a) Ensemble of multiple BERT-based models, (b) Aggregation process to handle long clinical texts, and (c) Unified ensemble and aggregation.
  • Figure 3: Comparative analysis and gradual performance improvement from preliminary approaches to selected baselines, ensemble models, aggregation models without overlap, aggregation models with overlap, Ensemble + Aggregation without overlap, and finally, Ensemble + Aggregation with overlap, which gives the best results in macro averaged % AUROC for the clinical tasks Mortality Prediction and Length of Stay Prediction.