Table of Contents
Fetching ...

A Structure-aware Generative Model for Biomedical Event Extraction

Haohan Yuan, Siu Cheung Hui, Haopeng Zhang

TL;DR

GenBEE tackles biomedical event extraction by injecting structure-aware prompts and prefixes into a generative model to capture nested and overlapping events. It constructs type description prompts and LLM-generated event templates, and learns structure-aware prefixes via BioBERT to condition BART. The model jointly handles trigger detection and argument extraction, demonstrating state-of-the-art performance on MLEE and GE11 and competitive results on PHEE, with ablation indicating both prompts and prefixes are crucial. In few-shot setups, GenBEE outperforms large LLMs as labeled data increases, underscoring its data efficiency and practical utility for biomedical text mining.

Abstract

Biomedical Event Extraction (BEE) is a challenging task that involves modeling complex relationships between fine-grained entities in biomedical text. BEE has traditionally been formulated as a classification problem. With recent advancements in large language models (LLMs), generation-based models that cast event extraction as a sequence generation problem have attracted attention in the NLP research community. However, current generative models often overlook cross-instance information in complex event structures, such as nested and overlapping events, which constitute over 20% of events in benchmark datasets. In this paper, we propose GenBEE, an event structure-aware generative model that captures complex event structures in biomedical text for biomedical event extraction. GenBEE constructs event prompts that distill knowledge from LLMs to incorporate both label semantics and argument dependency relationships. In addition, GenBEE generates prefixes with event structural prompts to incorporate structural features to improve the model's overall performance. We have evaluated the proposed GenBEE model on three widely used BEE benchmark datasets, namely MLEE, GE11, and PHEE. Experimental results show that GenBEE has achieved state-of-the-art performance on the MLEE and GE11 datasets, and achieved competitive results when compared to the state-of-the-art classification-based models on the PHEE dataset.

A Structure-aware Generative Model for Biomedical Event Extraction

TL;DR

GenBEE tackles biomedical event extraction by injecting structure-aware prompts and prefixes into a generative model to capture nested and overlapping events. It constructs type description prompts and LLM-generated event templates, and learns structure-aware prefixes via BioBERT to condition BART. The model jointly handles trigger detection and argument extraction, demonstrating state-of-the-art performance on MLEE and GE11 and competitive results on PHEE, with ablation indicating both prompts and prefixes are crucial. In few-shot setups, GenBEE outperforms large LLMs as labeled data increases, underscoring its data efficiency and practical utility for biomedical text mining.

Abstract

Biomedical Event Extraction (BEE) is a challenging task that involves modeling complex relationships between fine-grained entities in biomedical text. BEE has traditionally been formulated as a classification problem. With recent advancements in large language models (LLMs), generation-based models that cast event extraction as a sequence generation problem have attracted attention in the NLP research community. However, current generative models often overlook cross-instance information in complex event structures, such as nested and overlapping events, which constitute over 20% of events in benchmark datasets. In this paper, we propose GenBEE, an event structure-aware generative model that captures complex event structures in biomedical text for biomedical event extraction. GenBEE constructs event prompts that distill knowledge from LLMs to incorporate both label semantics and argument dependency relationships. In addition, GenBEE generates prefixes with event structural prompts to incorporate structural features to improve the model's overall performance. We have evaluated the proposed GenBEE model on three widely used BEE benchmark datasets, namely MLEE, GE11, and PHEE. Experimental results show that GenBEE has achieved state-of-the-art performance on the MLEE and GE11 datasets, and achieved competitive results when compared to the state-of-the-art classification-based models on the PHEE dataset.
Paper Structure (13 sections, 9 equations, 5 figures, 5 tables)

This paper contains 13 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An example of nested events identified from the given text.
  • Figure 2: Comparison of generative and classification-based models in extracting nested events from biomedical text.
  • Figure 3: The overall architecture of our proposed GenBEE model.
  • Figure 4: Experimental results using LLMs and GenBEE with few-shot learning based on the MLEE and GE11 datasets. All reported results are in Arg-C F1-scores (%).
  • Figure 5: A case study based on two examples taken from the GE11 dataset.