Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

Omid Rohanian; Mohammadmahdi Nouriborji; David A. Clifton

Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

Omid Rohanian, Mohammadmahdi Nouriborji, David A. Clifton

TL;DR

The paper addresses how to adapt general-purpose autoregressive LLMs to biomedical NLP by instruction-tuning two Llama2 variants (7B and 13B) on a medical instruction dataset of roughly 200K samples. It introduces the Llama2-MedTuned models and a unified medical instruction dataset (Llama2-MedTuned-Instructions) assembled from NER, RE, NLI, document classification, and QA tasks, enabled by Alpaca-style prompts and three-epoch training. Results show improved structured outputs and notable gains in MedNLI (e.g., from $37.20$ to $89.46$ accuracy) with the 13B model, and competitive performance relative to DistilBERT/BioBERT on several biomedical tasks, though not uniformly surpassing encoder-only baselines. Ablation studies reveal dataset composition effects, and the authors provide open-source code, models, and data for ongoing research and development in biomedical instruction tuning.

Abstract

Large Language Models (LLMs), particularly those similar to ChatGPT, have significantly influenced the field of Natural Language Processing (NLP). While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.

Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

TL;DR

accuracy) with the 13B model, and competitive performance relative to DistilBERT/BioBERT on several biomedical tasks, though not uniformly surpassing encoder-only baselines. Ablation studies reveal dataset composition effects, and the authors provide open-source code, models, and data for ongoing research and development in biomedical instruction tuning.

Abstract

instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.

Paper Structure (19 sections, 7 figures, 4 tables)

This paper contains 19 sections, 7 figures, 4 tables.

Introduction
Related Works
Autoregressive Language Models
Instruction-Based Language Models
Clinical LLMs
Method
Prompting Template
Tasks and Datasets
Named Entity Recognition
Relation Extraction
Natural Language Inference
Document Classification
Question Answering
Llama2-MedTuned Instructions
Training Configuration
...and 4 more sections

Figures (7)

Figure 1: Example outputs from Llama2-MedTuned-7B for biomedical tasks (left) and general medical instructions (right). The model demonstrates the application of instruction-based learning in NER by correctly labeling biomedical entities (left) and providing a relevant list in response to a medical inquiry (right).
Figure 2: Schematic representation of the process for fine-tuning Llama2 models with the proposed medical instruction dataset.
Figure 3: Overview of some of the prompt templates used in our instruction dataset.
Figure 4: Sample outputs of the Llama2 model and Llama2-MedTuned on Named Entity Recognition
Figure 5: Sample outputs of the Llama2 model and Llama2-MedTuned on Relation Extraction
...and 2 more figures

Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

TL;DR

Abstract

Exploring the Effectiveness of Instruction Tuning in Biomedical Language Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (7)