Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Aryo Pradipta Gema; Pasquale Minervini; Luke Daines; Tom Hope; Beatrice Alex

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Aryo Pradipta Gema, Pasquale Minervini, Luke Daines, Tom Hope, Beatrice Alex

TL;DR

The paper tackles the expensive process of domain adaptation for large language models in clinical settings by proposing a two-step parameter-efficient fine-tuning framework, combining Clinical LLaMA-LoRA for domain adaptation with Downstream LLaMA-LoRA for task-specific fine-tuning. It demonstrates that a small, domain-focused PEFT adaptor can achieve AUROC gains across multiple clinical downstream tasks, including large-scale multilabel diagnoses and procedures classification, while reducing training time and computational requirements. The study provides extensive empirical analysis comparing LoRA and other PEFT methods, showing that trainable CL-LLaMA-LoRA, especially when augmented with Downstream LLaMA-LoRA, yields the best macro-averaged AUROC scores and can outperform some clinically trained LMs. Overall, the framework offers a practical, resource-efficient pathway to deploy clinical LLMs with strong predictive performance, while highlighting limitations related to data diversity and potential spurious correlations.

Abstract

Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. Parameter-Efficient Fine-Tuning (PEFT) techniques for fine-tuning language models significantly reduce computational requirements by selectively fine-tuning small subsets of parameters. In this study, we propose a two-step PEFT framework and evaluate it in the clinical domain. Our approach combines a specialised PEFT adapter layer designed for clinical domain adaptation with another adapter specialised for downstream tasks. We evaluate the framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our framework achieves a better AUROC score averaged across all clinical downstream tasks compared to clinical language models. In particular, we observe large improvements of 4-5% AUROC in large-scale multilabel classification tasks, such as diagnoses and procedures classification. To our knowledge, this study is the first to provide an extensive empirical analysis of the interplay between PEFT techniques and domain adaptation in an important real-world domain of clinical applications.

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

TL;DR

Abstract

Paper Structure (33 sections, 5 equations, 2 figures, 8 tables)

This paper contains 33 sections, 5 equations, 2 figures, 8 tables.

Introduction
Background
Biomedical Large Language Models
Clinical Large Language Models
Parameter-Efficient Fine-Tuning for Large Language Models
Multi-step Adaptation
Methodology
Problem Statement
Domain-adaptive Pretraining.
Downstream Fine-tuning.
Two-step LLaMA-LoRA
LLaMA models
Domain-adaptive Pretraining: Clinical LLaMA-LoRA
Downstream Fine-tuning: Downstream LLaMA-LoRA
Baseline Models
...and 18 more sections

Figures (2)

Figure 1: An illustration of the proposed two-step PEFT framework. Clinical LLaMA-LoRA fine-tunes the pretrained LLaMA to the clinical domain. Downstream LLaMA-LoRA further fine-tunes the domain-adapted model to downstream clinical tasks.
Figure 2: Frameworks of domain-adaptive and downstream fine-tuning to adapt a pretrained LLM from the general domain to the clinical domain. As opposed to a full fine-tuning process which can be prohibitively expensive (left), our approach leverages PEFT techniques to introduce a clinically-specialised adapter that is attached to a pretrained general LLM (right). Our proposed framework also introduces another clinical PEFT adapter trained on the downstream clinical tasks, such as clinical note classification.

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

TL;DR

Abstract

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

Authors

TL;DR

Abstract

Table of Contents

Figures (2)