Table of Contents
Fetching ...

From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

Xu Zhang, Wenxin Ma, Chenxu Wu, Rongsheng Wang, Kun Zhang, S. Kevin Zhou

Abstract

ICD coding is a critical yet challenging task in healthcare. Recently, LLM-based methods demonstrate stronger generalization than discriminative methods in ICD coding. However, fine-tuning LLMs for ICD coding faces three major challenges. First, existing public ICD coding datasets provide limited coverage of the ICD code space, restricting a model's ability to generalize to unseen codes. Second, naive fine-tuning diminishes the interpretability of LLMs, as few public datasets contain explicit supporting evidence for assigned codes. Third, ICD coding typically involves long clinical documents, making fine-tuning LLMs computationally expensive. To address these issues, we propose Code-Centric Learning, a training framework that shifts supervision from full clinical documents to scalable, short evidence spans. The key idea of this framework is that span-level learning improves LLMs' ability to perform document-level ICD coding. Our proposed framework consists of a mixed training strategy and code-centric data expansion, which substantially reduces training cost, improves accuracy on unseen ICD codes and preserves interpretability. Under the same LLM backbone, our method substantially outperforms strong baselines. Notably, our method enables small-scale LLMs to achieve performance comparable to much larger proprietary models, demonstrating its effectiveness and potential for fully automated ICD coding.

From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

Abstract

ICD coding is a critical yet challenging task in healthcare. Recently, LLM-based methods demonstrate stronger generalization than discriminative methods in ICD coding. However, fine-tuning LLMs for ICD coding faces three major challenges. First, existing public ICD coding datasets provide limited coverage of the ICD code space, restricting a model's ability to generalize to unseen codes. Second, naive fine-tuning diminishes the interpretability of LLMs, as few public datasets contain explicit supporting evidence for assigned codes. Third, ICD coding typically involves long clinical documents, making fine-tuning LLMs computationally expensive. To address these issues, we propose Code-Centric Learning, a training framework that shifts supervision from full clinical documents to scalable, short evidence spans. The key idea of this framework is that span-level learning improves LLMs' ability to perform document-level ICD coding. Our proposed framework consists of a mixed training strategy and code-centric data expansion, which substantially reduces training cost, improves accuracy on unseen ICD codes and preserves interpretability. Under the same LLM backbone, our method substantially outperforms strong baselines. Notably, our method enables small-scale LLMs to achieve performance comparable to much larger proprietary models, demonstrating its effectiveness and potential for fully automated ICD coding.
Paper Structure (28 sections, 8 equations, 4 figures, 12 tables)

This paper contains 28 sections, 8 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: (a) Traditional paradigm relies on large-scale documents for training, which is inefficient, limited in code coverage, and lacks interpretability. (b) Our method uses only 200 documents to learn evidence-based ICD coding, while leveraging scalable spans to learn code knowledge. (c) Our approach hugely improves accuracy on codes unseen in documents, provides interpretable evidence, and reduces training time.
  • Figure 2: Overview of Code-Centric Learning framework. Under mixed training, document-level data enables the LLM to aggregate evidence from the full context and assign multiple ICD codes, while span-level data provides LLM with code-specific knowledge. Code-centric data expansion leverages LLMs to extract and infer evidence spans for each code from diverse knowledge sources, addressing codes not present in documents.
  • Figure 3: (a) Test-set codes are partitioned into seen codes and unseen codes (whether occur in the document-level training data or not). (b) Sources of spans constructed for unseen codes, with proportions of three strategies. (c) Our method improves coding accuracy on unseen codes using spans only, without additional documents. (d) Each strategy contributes to improved coding accuracy.
  • Figure 4: An example from the test set. Code-only methods cannot generate evidence, making the results difficult for humans to evaluate and revise. Evidence-based methods suffer from limited evidence-annotated data for fine-tuing, and therefore achieve lower accuracy. Our method balances interpretability and accuracy, producing evidence and ICD codes that are highly consistent with human annotations.