Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study
Zhaoyue Sun, Gabriele Pergola, Byron C. Wallace, Yulan He
TL;DR
The paper evaluates ChatGPT for pharmacovigilance event extraction, comparing zero-shot and few-shot prompting with various demonstration strategies against fine-tuned baselines on the PHEE dataset. It demonstrates that while ChatGPT can achieve reasonable performance with well-chosen demonstrations, fully fine-tuned small models still outperform it, especially with ample data. The study also investigates using ChatGPT for data augmentation, finding that unfiltered synthesized data degrades performance, whereas applying targeted quality filters can stabilize results but not surpass supervised models. The findings provide practical guidance on when LLM-based prompting is advantageous and highlight data quality and task complexity as critical factors for pharmacovigilance applications.
Abstract
With the advent of large language models (LLMs), there has been growing interest in exploring their potential for medical applications. This research aims to investigate the ability of LLMs, specifically ChatGPT, in the context of pharmacovigilance event extraction, of which the main goal is to identify and extract adverse events or potential therapeutic events from textual medical sources. We conduct extensive experiments to assess the performance of ChatGPT in the pharmacovigilance event extraction task, employing various prompts and demonstration selection strategies. The findings demonstrate that while ChatGPT demonstrates reasonable performance with appropriate demonstration selection strategies, it still falls short compared to fully fine-tuned small models. Additionally, we explore the potential of leveraging ChatGPT for data augmentation. However, our investigation reveals that the inclusion of synthesized data into fine-tuning may lead to a decrease in performance, possibly attributed to noise in the ChatGPT-generated labels. To mitigate this, we explore different filtering strategies and find that, with the proper approach, more stable performance can be achieved, although constant improvement remains elusive.
