Table of Contents
Fetching ...

Benchingmaking Large Langage Models in Biomedical Triple Extraction

Mingchen Li, Huixue Zhou, Rui Zhang

TL;DR

This work addresses the challenge of biomedical triple extraction by evaluating large language models on sentence-level extraction and addressing dataset scarcity with a new, richly annotated GIT benchmark. It combines repurposed MTCD data with newly annotated CIHT sentences to create 22 relation types across 4,691 sentences, offering a broad evaluation ground alongside established datasets like CHEMPROT and DDI. The experiments reveal that larger, domain-informed models (e.g., LLaMA2-13B) outperform GPT-3.5/4 in this setting, while zero-shot GPT models lag behind, highlighting the importance of domain adaptation and prompt design. Collectively, the paper provides a valuable dataset and comparative analysis that pave the way for more robust LLM-driven biomedical relation extraction and knowledge-graph construction.

Abstract

Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying large language models (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various large language models. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.

Benchingmaking Large Langage Models in Biomedical Triple Extraction

TL;DR

This work addresses the challenge of biomedical triple extraction by evaluating large language models on sentence-level extraction and addressing dataset scarcity with a new, richly annotated GIT benchmark. It combines repurposed MTCD data with newly annotated CIHT sentences to create 22 relation types across 4,691 sentences, offering a broad evaluation ground alongside established datasets like CHEMPROT and DDI. The experiments reveal that larger, domain-informed models (e.g., LLaMA2-13B) outperform GPT-3.5/4 in this setting, while zero-shot GPT models lag behind, highlighting the importance of domain adaptation and prompt design. Collectively, the paper provides a valuable dataset and comparative analysis that pave the way for more robust LLM-driven biomedical relation extraction and knowledge-graph construction.

Abstract

Biomedical triple extraction systems aim to automatically extract biomedical entities and relations between entities. The exploration of applying large language models (LLM) to triple extraction is still relatively unexplored. In this work, we mainly focus on sentence-level biomedical triple extraction. Furthermore, the absence of a high-quality biomedical triple extraction dataset impedes the progress in developing robust triple extraction systems. To address these challenges, initially, we compare the performance of various large language models. Additionally, we present GIT, an expert-annotated biomedical triple extraction dataset that covers a wider range of relation types.
Paper Structure (19 sections, 1 figure, 5 tables)

This paper contains 19 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Example of Prompt 1 and Prompt 2 defined for GPT3.5/4 on the task of triple extraction.