Table of Contents
Fetching ...

Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records

Noah Losch, Lucas Plagwitz, Antonius Büscher, Julian Varghese

TL;DR

This study investigates locally fine-tuning small LLMs on privacy-sensitive, small medical datasets to perform text classification and named entity recognition. Using Llama3-8b-instruct with LoRA adapters via the Axolotl framework, the authors evaluate English i2b2 Smoking Challenge data and German cardiology reports, with training sizes as small as 200–300 examples. The results show that such fine-tuning can outperform zero-shot baselines in classification and reach parity with, or surpass, larger open-source models for NER, while improving the reliability of machine-readable outputs. The findings suggest that task-specific, locally trained LLMs offer a practical, privacy-preserving route to automating clinical workflows and extracting structured information from unstructured medical text.

Abstract

We investigate the effectiveness of fine-tuning large language models (LLMs) on small medical datasets for text classification and named entity recognition tasks. Using a German cardiology report dataset and the i2b2 Smoking Challenge dataset, we demonstrate that fine-tuning small LLMs locally on limited training data can improve performance achieving comparable results to larger models. Our experiments show that fine-tuning improves performance on both tasks, with notable gains observed with as few as 200-300 training examples. Overall, the study highlights the potential of task-specific fine-tuning of LLMs for automating clinical workflows and efficiently extracting structured data from unstructured medical text.

Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records

TL;DR

This study investigates locally fine-tuning small LLMs on privacy-sensitive, small medical datasets to perform text classification and named entity recognition. Using Llama3-8b-instruct with LoRA adapters via the Axolotl framework, the authors evaluate English i2b2 Smoking Challenge data and German cardiology reports, with training sizes as small as 200–300 examples. The results show that such fine-tuning can outperform zero-shot baselines in classification and reach parity with, or surpass, larger open-source models for NER, while improving the reliability of machine-readable outputs. The findings suggest that task-specific, locally trained LLMs offer a practical, privacy-preserving route to automating clinical workflows and extracting structured information from unstructured medical text.

Abstract

We investigate the effectiveness of fine-tuning large language models (LLMs) on small medical datasets for text classification and named entity recognition tasks. Using a German cardiology report dataset and the i2b2 Smoking Challenge dataset, we demonstrate that fine-tuning small LLMs locally on limited training data can improve performance achieving comparable results to larger models. Our experiments show that fine-tuning improves performance on both tasks, with notable gains observed with as few as 200-300 training examples. Overall, the study highlights the potential of task-specific fine-tuning of LLMs for automating clinical workflows and efficiently extracting structured data from unstructured medical text.

Paper Structure

This paper contains 11 sections, 2 tables.