Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning
Amnon Bleich, Antje Linnemann, Bjoern H. Diem, Tim OF Conrad
TL;DR
This paper addresses the lack of scalable methods for automated text generation from ECG signals by adapting image-captioning encoder–decoder architectures to ECG data. It uses a 1D ResNet34 encoder coupled with Transformer or LSTM decoders and leverages free-text clinical reports from PTB-XL and ICM datasets, including steps for data preprocessing, translation, and abbreviation unification. The key findings show that the proposed models achieve state-of-the-art METEOR scores on PTB-XL official splits (notably around 55.5% for the best configuration, well above the reference), with translation and careful preprocessing providing significant gains; pre-training the encoder on rhythm labels offers limited benefit. The study provides a reproducible benchmark and demonstrates robustness across diverse ECG formats, laying groundwork for future expansion to larger datasets and other 1D medical signals, such as EEG, as well as potential clinical decision support applications.
Abstract
Recent advances in deep learning and natural language generation have significantly improved image captioning, enabling automated, human-like descriptions for visual content. In this work, we apply these captioning techniques to generate clinician-like interpretations of ECG data. This study leverages existing ECG datasets accompanied by free-text reports authored by healthcare professionals (HCPs) as training data. These reports, while often inconsistent, provide a valuable foundation for automated learning. We introduce an encoder-decoder-based method that uses these reports to train models to generate detailed descriptions of ECG episodes. This represents a significant advancement in ECG analysis automation, with potential applications in zero-shot classification and automated clinical decision support. The model is tested on various datasets, including both 1- and 12-lead ECGs. It significantly outperforms the state-of-the-art reference model by Qiu et al., achieving a METEOR score of 55.53% compared to 24.51% achieved by the reference model. Furthermore, several key design choices are discussed, providing a comprehensive overview of current challenges and innovations in this domain. The source codes for this research are publicly available in our Git repository https://git.zib.de/ableich/ecg-comment-generation-public
