Table of Contents
Fetching ...

Probability-Biased Attention over Directed Bipartite Graphs for Long-Tail ICD Coding

Tianlei Chen, Yuxiao Chen, Yang Li, Feifei Wang

TL;DR

This work tackles the long-tail ICD coding problem with tens of thousands of codes by introducing ProBias, a Directed Bipartite Graph Encoder that transfers information from common to rare codes using a probability-based bias derived from conditional co-occurrence $P(l_{c_j}|l_{r_i})$. It combines LLM-generated, clinically rich code descriptions with a graph-attention mechanism that injects discretized co-occurrence biases into attention, thereby enriching rare-code representations. Empirical results on MIMIC-III-ICD-9, MIMIC-IV-ICD-9, and MIMIC-IV-ICD-10 demonstrate state-of-the-art Macro-F1 scores, highlighting strong gains on long-tail codes while maintaining overall accuracy. The approach offers a scalable, knowledge-rich path to improve automated ICD coding, with potential impact on coding efficiency and fairness across rare conditions; future work explores continuous bias mappings and gating mechanisms to fuse descriptions with ontologies.

Abstract

Automated International Classification of Diseases (ICD) coding aims to assign multiple disease codes to clinical documents, constituting a crucial multi-label text classification task in healthcare informatics. However, the task is challenging due to its large label space (10,000 to 20,000 codes) and long-tail distribution, where a few codes dominate while many rare codes lack sufficient training data. To address this, we propose a learning method that models fine-grained co-occurrence relationships among codes. Specifically, we construct a Directed Bipartite Graph Encoder with disjoint sets of common and rare code nodes. To facilitate a one-way information flow, edges are directed exclusively from common to rare codes. The nature of these connections is defined by a probability-based bias, which is derived from the conditional probability of a common code co-occurring given the presence of a rare code. This bias is then injected into the encoder's attention module, a process we term Co-occurrence Encoding. This structure empowers the graph encoder to enrich rare code representations by aggregating latent comorbidity information reflected in the statistical co-occurrence of their common counterparts. To ensure high-quality input to the graph, we utilize a large language model (LLM) to generate comprehensive descriptions for codes, enriching initial embeddings with clinical context and comorbidity information, serving as external knowledge for the statistical co-occurrence relationships in the code system. Experiments on three automated ICD coding benchmark datasets demonstrate that our method achieves state-of-the-art performance with particularly notable improvements in Macro-F1, which is the key metric for long-tail classification.

Probability-Biased Attention over Directed Bipartite Graphs for Long-Tail ICD Coding

TL;DR

This work tackles the long-tail ICD coding problem with tens of thousands of codes by introducing ProBias, a Directed Bipartite Graph Encoder that transfers information from common to rare codes using a probability-based bias derived from conditional co-occurrence . It combines LLM-generated, clinically rich code descriptions with a graph-attention mechanism that injects discretized co-occurrence biases into attention, thereby enriching rare-code representations. Empirical results on MIMIC-III-ICD-9, MIMIC-IV-ICD-9, and MIMIC-IV-ICD-10 demonstrate state-of-the-art Macro-F1 scores, highlighting strong gains on long-tail codes while maintaining overall accuracy. The approach offers a scalable, knowledge-rich path to improve automated ICD coding, with potential impact on coding efficiency and fairness across rare conditions; future work explores continuous bias mappings and gating mechanisms to fuse descriptions with ontologies.

Abstract

Automated International Classification of Diseases (ICD) coding aims to assign multiple disease codes to clinical documents, constituting a crucial multi-label text classification task in healthcare informatics. However, the task is challenging due to its large label space (10,000 to 20,000 codes) and long-tail distribution, where a few codes dominate while many rare codes lack sufficient training data. To address this, we propose a learning method that models fine-grained co-occurrence relationships among codes. Specifically, we construct a Directed Bipartite Graph Encoder with disjoint sets of common and rare code nodes. To facilitate a one-way information flow, edges are directed exclusively from common to rare codes. The nature of these connections is defined by a probability-based bias, which is derived from the conditional probability of a common code co-occurring given the presence of a rare code. This bias is then injected into the encoder's attention module, a process we term Co-occurrence Encoding. This structure empowers the graph encoder to enrich rare code representations by aggregating latent comorbidity information reflected in the statistical co-occurrence of their common counterparts. To ensure high-quality input to the graph, we utilize a large language model (LLM) to generate comprehensive descriptions for codes, enriching initial embeddings with clinical context and comorbidity information, serving as external knowledge for the statistical co-occurrence relationships in the code system. Experiments on three automated ICD coding benchmark datasets demonstrate that our method achieves state-of-the-art performance with particularly notable improvements in Macro-F1, which is the key metric for long-tail classification.

Paper Structure

This paper contains 21 sections, 9 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example of a clinical document assigned the co-occurring codes "Urea cycle disorder", "Acidosis", and others. "Acidosis" is often triggered by "Urea cycle disorder", as toxic ammonia buildup disrupts cellular metabolism.
  • Figure 2: The pipeline of our proposed model, ProBias. The process begins with generating comprehensive code descriptions via a Large Language Model (LLM). Next, a Directed Bipartite Graph is constructed based on the Original Code Co-occurrence Relationships and Conditional Probability Matrix, in which arrows of different colors denote different probabilities. The information flow within this graph is governed by our Co-occurrence Encoding, which injects a learnable bias derived from the Conditional Probability Matrix into the attention scores. The final classification is then performed using a Co-occurrence-Infused Multi-Label Attention mechanism.
  • Figure 3: A structured prompt template designed to produce a comprehensive description for each ICD code, covering clinical contexts, procedural methods, and comorbidity information.
  • Figure 4: The description of code 270.6 generated by GPT-4o.
  • Figure 5: The power-law label distribution of three benchmark datasets. We sort the label IDs in descending order with reference to the number of related documents.