On Preserving the Knowledge of Long Clinical Texts
Mohammad Junayed Hasan, Suhra Noor, Mohammad Ashrafuzzaman Khan
TL;DR
This work tackles the fixed-length input constraint of transformer encoders in processing long clinical notes by proposing an integrated framework that aggregates predictions across text chunks and fuses multiple pre-trained clinical BERT-based encoders. Using MIMIC-III admission notes for mortality and length-of-stay prediction, the approach demonstrates that text aggregation generally outperforms individual baselines and that combining aggregation with ensemble methods yields the best overall performance. The main practical insight is that overlapping chunks and model fusion help preserve critical information across long documents, improving robustness and predictive accuracy in clinical settings. The study also discusses limitations, notably computational overhead, and suggests future directions like model distillation and targeted clinical-domain pre-training for summarization tasks.
Abstract
Clinical texts, such as admission notes, discharge summaries, and progress notes, contain rich and valuable information that can be used for clinical decision making. However, a severe bottleneck in using transformer encoders for processing clinical texts comes from the input length limit of these models: transformer-based encoders use fixed-length inputs. Therefore, these models discard part of the inputs while processing medical text. There is a risk of losing vital knowledge from clinical text if only part of it is processed. This paper proposes a novel method to preserve the knowledge of long clinical texts in the models using aggregated ensembles of transformer encoders. Previous studies used either ensemble or aggregation, but we studied the effects of fusing these methods. We trained several pre-trained BERT-like transformer encoders on two clinical outcome tasks: mortality prediction and length of stay prediction. Our method achieved better results than all baseline models for prediction tasks on long clinical notes. We conducted extensive experiments on the MIMIC-III clinical database's admission notes by combining multiple unstructured and high-dimensional datasets, demonstrating our method's effectiveness and superiority over existing approaches. This study shows that fusing ensemble and aggregation improves the model performance for clinical prediction tasks, particularly the mortality and the length of hospital stay.
