Table of Contents
Fetching ...

Improving Small Language Models on PubMedQA via Generative Data Augmentation

Zhen Guo, Peiqi Wang, Yanwei Wang, Shangdi Yu

TL;DR

The paper tackles the efficiency gap between large and small language systems in medical QA by applying LLM-based generative data augmentation. It combines efficient fine-tuning methods (LoRA and Prefix Tuning) with LLM-generated rewrites and new QA pairs, evaluating on PubMedQA. Key findings show that LoRA is robust across hyperparameters, domain-knowledge LLMs (GPT-4) yield the most beneficial augmented data, and a sub-1.6B parameter SLS can surpass few-shot GPT-4 in PubMedQA performance. Public data and code are released, signaling a practical, scalable path to domain specialization with much lower compute than LLMs.

Abstract

Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. However, their increasing size poses challenges in terms of computational cost. On the other hand, Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data, especially in specific domains. In this paper, we introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation. The objective of our approach is to develop more efficient and capable models that are specifically tailored for specialized applications. Through experiments conducted on the PubMedQA dataset, we demonstrate the effectiveness of LLMs in refining and diversifying existing question-answer pairs. This refinement process leads to improved performance in a significantly smaller model after fine-tuning. Notably, our best SLM, with under 1.6 billion parameters, outperforms the few-shot GPT-4 on the PubMedQA dataset. Our code and generated data are publicly available to facilitate further explorations.

Improving Small Language Models on PubMedQA via Generative Data Augmentation

TL;DR

The paper tackles the efficiency gap between large and small language systems in medical QA by applying LLM-based generative data augmentation. It combines efficient fine-tuning methods (LoRA and Prefix Tuning) with LLM-generated rewrites and new QA pairs, evaluating on PubMedQA. Key findings show that LoRA is robust across hyperparameters, domain-knowledge LLMs (GPT-4) yield the most beneficial augmented data, and a sub-1.6B parameter SLS can surpass few-shot GPT-4 in PubMedQA performance. Public data and code are released, signaling a practical, scalable path to domain specialization with much lower compute than LLMs.

Abstract

Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. However, their increasing size poses challenges in terms of computational cost. On the other hand, Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data, especially in specific domains. In this paper, we introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation. The objective of our approach is to develop more efficient and capable models that are specifically tailored for specialized applications. Through experiments conducted on the PubMedQA dataset, we demonstrate the effectiveness of LLMs in refining and diversifying existing question-answer pairs. This refinement process leads to improved performance in a significantly smaller model after fine-tuning. Notably, our best SLM, with under 1.6 billion parameters, outperforms the few-shot GPT-4 on the PubMedQA dataset. Our code and generated data are publicly available to facilitate further explorations.
Paper Structure (11 sections, 1 figure, 3 tables)