Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen; John-Michael Gamble; Micaela Jantzi; John P. Hirdes; Jimmy Lin

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes, Jimmy Lin

TL;DR

This work addresses the bottleneck of manual ATC coding in privacy-sensitive healthcare by framing ATC assignment as hierarchical information extraction guided by the ATC ontology. It leverages level-by-level prompting with both GPT-4o and open-source Llama models, and enhances predictions with UMLS-based grounding. The study introduces a gold-standard dataset and evaluates across Health Canada, RABBITS, and Ontario Health data, showing 78% exact matches with GPT-4o and 60% with Llama 3.1 70B in zero-shot settings, while fine-tuning smaller Llama models closes the gap; grounding yields modest gains deeper in the hierarchy. Overall, the work demonstrates the feasibility of privacy-preserving automatic ATC coding and provides practical insights for deployment in real-world healthcare environments.

Abstract

Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves 78% exact match accuracy with GPT-4o and 60% with Llama 3.1 70B. We investigate knowledge grounding through drug definitions, finding modest improvements in accuracy. Further, we show that fine-tuned Llama 3.1 8B matches zero-shot Llama 3.1 70B accuracy, suggesting that effective ATC coding is feasible with smaller models. Our results demonstrate the feasibility of automatic ATC coding in privacy-sensitive healthcare environments, providing a foundation for future deployments.

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

TL;DR

Abstract

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)