Table of Contents
Fetching ...

CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning

Junyu Luo, Xiaochen Wang, Jiaqi Wang, Aofei Chang, Yaqing Wang, Fenglong Ma

TL;DR

CoRelation tackles automatic ICD coding by learning contextualized, note-specific relationships among ICD codes. It combines contextualized code embeddings, a per-note flexible bipartite graph for relation learning, a graph-transformer update, and a self-adaptive gate to fuse direct and relation-enhanced predictions. A selective training strategy reduces computational cost while maintaining accuracy. Experiments on six public ICD datasets show state-of-the-art performance with substantially fewer parameters than PLM-based methods, and analysis demonstrates interpretable learned code relations.

Abstract

Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and often overlook the importance of context in clinical notes. In this paper, we propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations. Our approach, unlike existing methods, employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations. We evaluate our approach on six public ICD coding datasets and the experimental results demonstrate the effectiveness of our approach compared to state-of-the-art baselines.

CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning

TL;DR

CoRelation tackles automatic ICD coding by learning contextualized, note-specific relationships among ICD codes. It combines contextualized code embeddings, a per-note flexible bipartite graph for relation learning, a graph-transformer update, and a self-adaptive gate to fuse direct and relation-enhanced predictions. A selective training strategy reduces computational cost while maintaining accuracy. Experiments on six public ICD datasets show state-of-the-art performance with substantially fewer parameters than PLM-based methods, and analysis demonstrates interpretable learned code relations.

Abstract

Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and often overlook the importance of context in clinical notes. In this paper, we propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations. Our approach, unlike existing methods, employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations. We evaluate our approach on six public ICD coding datasets and the experimental results demonstrate the effectiveness of our approach compared to state-of-the-art baselines.
Paper Structure (28 sections, 14 equations, 3 figures, 11 tables, 2 algorithms)

This paper contains 28 sections, 14 equations, 3 figures, 11 tables, 2 algorithms.

Figures (3)

  • Figure 1: The Proposed CoRelation structure.
  • Figure 2: We modify the original ICD ontology into a directed flexible bipartite graph. There is an edge for each code pair, and the edge type depends on the distance between two codes on the original ICD ontology.
  • Figure 3: Two typical learned code relation cases.