Benchingmaking Large Langage Models in Biomedical Triple Extraction
Mingchen Li, Huixue Zhou, Rui Zhang
TL;DR
This work addresses the challenge of biomedical triple extraction by evaluating large language models on sentence-level extraction and addressing dataset scarcity with a new, richly annotated GIT benchmark. It combines repurposed MTCD data with newly annotated CIHT sentences to create 22 relation types across 4,691 sentences, offering a broad evaluation ground alongside established datasets like CHEMPROT and DDI. The experiments reveal that larger, domain-informed models (e.g., LLaMA2-13B) outperform GPT-3.5/4 in this setting, while zero-shot GPT models lag behind, highlighting the importance of domain adaptation and prompt design. Collectively, the paper provides a valuable dataset and comparative analysis that pave the way for more robust LLM-driven biomedical relation extraction and knowledge-graph construction.
Abstract
Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying large language models (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various large language models. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.
