Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

Gonçalo Gomes; Isabel Coutinho; Bruno Martins

Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

Gonçalo Gomes, Isabel Coutinho, Bruno Martins

TL;DR

This work tackles automated ICD code assignment from long, complex clinical notes by introducing a Transformer-based framework that blends two text-encoding regimes (Longformer-enabled LE and Chunk Encoding CE) with a label-embedding scheme built from diverse ICD synonyms. A multi-synonym attention mechanism leverages UMLS and external knowledge bases to enrich code representations, while a joint learning objective $\mathcal{L} = \mathcal{L}_C + \lambda \mathcal{L}_Q$ couples multi-label classification with text quantification to improve calibration. Empirically, CE with Multi-Synonyms Attention and joint quantification achieves state-of-the-art results on MIMIC-III-50 and MIMIC-III-clean, with the Huber-based calibration variant (CLQ_Huber) providing robust calibration without sacrificing classification accuracy. The approach yields well-calibrated probabilities that support downstream tasks like prevalence estimation and quantification, and it demonstrates strong few-shot behavior for rare codes. Future work could exploit ICD hierarchical structure and test alternative calibration-focused losses to further boost performance and reliability.

Abstract

Although the International Classification of Diseases (ICD) has been adopted worldwide, manually assigning ICD codes to clinical text is time-consuming, error-prone, and expensive, motivating the development of automated approaches. This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work. We specifically employ a strong Transformer-based model as a text encoder and, to handle lengthy clinical narratives, we explored either (a) adapting the base encoder model into a Longformer, or (b) dividing the text into chunks and processing each chunk independently. The representations produced by the encoder are combined with a label embedding mechanism that explores diverse ICD code synonyms. Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding, with the label embeddings significantly contributing to the good performance. Our approach also leads to properly calibrated classification results, which can effectively inform downstream tasks such as quantification.

Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

TL;DR

couples multi-label classification with text quantification to improve calibration. Empirically, CE with Multi-Synonyms Attention and joint quantification achieves state-of-the-art results on MIMIC-III-50 and MIMIC-III-clean, with the Huber-based calibration variant (CLQ_Huber) providing robust calibration without sacrificing classification accuracy. The approach yields well-calibrated probabilities that support downstream tasks like prevalence estimation and quantification, and it demonstrates strong few-shot behavior for rare codes. Future work could exploit ICD hierarchical structure and test alternative calibration-focused losses to further boost performance and reliability.

Abstract

Paper Structure (18 sections, 14 equations, 5 figures, 13 tables)

This paper contains 18 sections, 14 equations, 5 figures, 13 tables.

Introduction
Related Work
Proposed Approach
Clinical Text Modeling
Multi-Synonyms Attention
Joint Classification and Quantification
Experimental Evaluation
Datasets
Evaluation Metrics
Implementation Details
Experiments and Results
Classification
Quantification
Conclusions and Future Work
Appendix
...and 3 more sections

Figures (5)

Figure 1: A simple classification architecture that considers the Chunk Encoding (CE) approach.
Figure 2: Smooth document segmentation with 255 token overlaps. Each chunk includes, at the end, the sentence separation token [SEP] characteristic of BERT-type models, completing $512$ tokens per chunk.
Figure 3: The classification architecture that combines the CE with a multi-synonyms attention mechanism.
Figure 4: Relative frequency, absolute error, and F1 scores for each ICD code over MIMIC-III-50 dataset.
Figure 5: Estimated versus real prevalence for the two most frequent (top) and rarest (bottom) ICD codes in the MIMIC-III-50 dataset.

Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

TL;DR

Abstract

Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

Authors

TL;DR

Abstract

Table of Contents

Figures (5)