Probability-Biased Attention over Directed Bipartite Graphs for Long-Tail ICD Coding
Tianlei Chen, Yuxiao Chen, Yang Li, Feifei Wang
TL;DR
This work tackles the long-tail ICD coding problem with tens of thousands of codes by introducing ProBias, a Directed Bipartite Graph Encoder that transfers information from common to rare codes using a probability-based bias derived from conditional co-occurrence $P(l_{c_j}|l_{r_i})$. It combines LLM-generated, clinically rich code descriptions with a graph-attention mechanism that injects discretized co-occurrence biases into attention, thereby enriching rare-code representations. Empirical results on MIMIC-III-ICD-9, MIMIC-IV-ICD-9, and MIMIC-IV-ICD-10 demonstrate state-of-the-art Macro-F1 scores, highlighting strong gains on long-tail codes while maintaining overall accuracy. The approach offers a scalable, knowledge-rich path to improve automated ICD coding, with potential impact on coding efficiency and fairness across rare conditions; future work explores continuous bias mappings and gating mechanisms to fuse descriptions with ontologies.
Abstract
Automated International Classification of Diseases (ICD) coding aims to assign multiple disease codes to clinical documents, constituting a crucial multi-label text classification task in healthcare informatics. However, the task is challenging due to its large label space (10,000 to 20,000 codes) and long-tail distribution, where a few codes dominate while many rare codes lack sufficient training data. To address this, we propose a learning method that models fine-grained co-occurrence relationships among codes. Specifically, we construct a Directed Bipartite Graph Encoder with disjoint sets of common and rare code nodes. To facilitate a one-way information flow, edges are directed exclusively from common to rare codes. The nature of these connections is defined by a probability-based bias, which is derived from the conditional probability of a common code co-occurring given the presence of a rare code. This bias is then injected into the encoder's attention module, a process we term Co-occurrence Encoding. This structure empowers the graph encoder to enrich rare code representations by aggregating latent comorbidity information reflected in the statistical co-occurrence of their common counterparts. To ensure high-quality input to the graph, we utilize a large language model (LLM) to generate comprehensive descriptions for codes, enriching initial embeddings with clinical context and comorbidity information, serving as external knowledge for the statistical co-occurrence relationships in the code system. Experiments on three automated ICD coding benchmark datasets demonstrate that our method achieves state-of-the-art performance with particularly notable improvements in Macro-F1, which is the key metric for long-tail classification.
