Table of Contents
Fetching ...

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes, Jimmy Lin

TL;DR

This work addresses the bottleneck of manual ATC coding in privacy-sensitive healthcare by framing ATC assignment as hierarchical information extraction guided by the ATC ontology. It leverages level-by-level prompting with both GPT-4o and open-source Llama models, and enhances predictions with UMLS-based grounding. The study introduces a gold-standard dataset and evaluates across Health Canada, RABBITS, and Ontario Health data, showing 78% exact matches with GPT-4o and 60% with Llama 3.1 70B in zero-shot settings, while fine-tuning smaller Llama models closes the gap; grounding yields modest gains deeper in the hierarchy. Overall, the work demonstrates the feasibility of privacy-preserving automatic ATC coding and provides practical insights for deployment in real-world healthcare environments.

Abstract

Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves 78% exact match accuracy with GPT-4o and 60% with Llama 3.1 70B. We investigate knowledge grounding through drug definitions, finding modest improvements in accuracy. Further, we show that fine-tuned Llama 3.1 8B matches zero-shot Llama 3.1 70B accuracy, suggesting that effective ATC coding is feasible with smaller models. Our results demonstrate the feasibility of automatic ATC coding in privacy-sensitive healthcare environments, providing a foundation for future deployments.

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

TL;DR

This work addresses the bottleneck of manual ATC coding in privacy-sensitive healthcare by framing ATC assignment as hierarchical information extraction guided by the ATC ontology. It leverages level-by-level prompting with both GPT-4o and open-source Llama models, and enhances predictions with UMLS-based grounding. The study introduces a gold-standard dataset and evaluates across Health Canada, RABBITS, and Ontario Health data, showing 78% exact matches with GPT-4o and 60% with Llama 3.1 70B in zero-shot settings, while fine-tuning smaller Llama models closes the gap; grounding yields modest gains deeper in the hierarchy. Overall, the work demonstrates the feasibility of privacy-preserving automatic ATC coding and provides practical insights for deployment in real-world healthcare environments.

Abstract

Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves 78% exact match accuracy with GPT-4o and 60% with Llama 3.1 70B. We investigate knowledge grounding through drug definitions, finding modest improvements in accuracy. Further, we show that fine-tuned Llama 3.1 8B matches zero-shot Llama 3.1 70B accuracy, suggesting that effective ATC coding is feasible with smaller models. Our results demonstrate the feasibility of automatic ATC coding in privacy-sensitive healthcare environments, providing a foundation for future deployments.

Paper Structure

This paper contains 23 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Prompt template used at each level of the ATC hierarchy. The LLM is presented with all valid options for the current level, based on the selection from the previous level.
  • Figure 2: Examples of drug mentions and their corresponding ATC codes at each level on the Health Canada product names, RABBITS product names, and the Ontario Health assessments. Each ATC code is followed by its generic name, as in the "With Name" setting.