Table of Contents
Fetching ...

Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation

Xuankang Zhang, Jiangming Liu

TL;DR

The paper tackles the challenge of named entity recognition in COVID-19 text by addressing data scarcity and domain-specific knowledge requirements through a novel LLM-based Entity Knowledge Augmentation (LLM-EKA). The framework comprises demonstration selection, entity augmentation, and instance augmentation to align large language models with domain knowledge, improving NER performance on METS-CoV and BioRED in both fully-supervised and few-shot settings. Empirical results show that LLM-EKA outperforms traditional augmentation methods, standard NER baselines, and even several LLM-based approaches, with iterative augmentation yielding particularly strong gains and high accuracy on domain-specific entities such as drugs and vaccines. While effective, the approach relies on a fixed base model and proprietary LLM APIs, highlighting practical considerations for deployment and avenues for future open-domain expansion and refinement in biomedical NER.

Abstract

The COVID-19 pandemic causes severe social and economic disruption around the world, raising various subjects that are discussed over social media. Identifying pandemic-related named entities as expressed on social media is fundamental and important to understand the discussions about the pandemic. However, there is limited work on named entity recognition on this topic due to the following challenges: 1) COVID-19 texts in social media are informal and their annotations are rare and insufficient to train a robust recognition model, and 2) named entity recognition in COVID-19 requires extensive domain-specific knowledge. To address these issues, we propose a novel entity knowledge augmentation approach for COVID-19, which can also be applied in general biomedical named entity recognition in both informal text format and formal text format. Experiments carried out on the COVID-19 tweets dataset and PubMed dataset show that our proposed entity knowledge augmentation improves NER performance in both fully-supervised and few-shot settings. Our source code is publicly available: https://github.com/kkkenshi/LLM-EKA/tree/master

Named Entity Recognition in COVID-19 tweets with Entity Knowledge Augmentation

TL;DR

The paper tackles the challenge of named entity recognition in COVID-19 text by addressing data scarcity and domain-specific knowledge requirements through a novel LLM-based Entity Knowledge Augmentation (LLM-EKA). The framework comprises demonstration selection, entity augmentation, and instance augmentation to align large language models with domain knowledge, improving NER performance on METS-CoV and BioRED in both fully-supervised and few-shot settings. Empirical results show that LLM-EKA outperforms traditional augmentation methods, standard NER baselines, and even several LLM-based approaches, with iterative augmentation yielding particularly strong gains and high accuracy on domain-specific entities such as drugs and vaccines. While effective, the approach relies on a fixed base model and proprietary LLM APIs, highlighting practical considerations for deployment and avenues for future open-domain expansion and refinement in biomedical NER.

Abstract

The COVID-19 pandemic causes severe social and economic disruption around the world, raising various subjects that are discussed over social media. Identifying pandemic-related named entities as expressed on social media is fundamental and important to understand the discussions about the pandemic. However, there is limited work on named entity recognition on this topic due to the following challenges: 1) COVID-19 texts in social media are informal and their annotations are rare and insufficient to train a robust recognition model, and 2) named entity recognition in COVID-19 requires extensive domain-specific knowledge. To address these issues, we propose a novel entity knowledge augmentation approach for COVID-19, which can also be applied in general biomedical named entity recognition in both informal text format and formal text format. Experiments carried out on the COVID-19 tweets dataset and PubMed dataset show that our proposed entity knowledge augmentation improves NER performance in both fully-supervised and few-shot settings. Our source code is publicly available: https://github.com/kkkenshi/LLM-EKA/tree/master

Paper Structure

This paper contains 22 sections, 3 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Examples of various data augmentation methods in few-shot settings. The named entities are [bold].
  • Figure 2: Framework of LLM-based Entity Knowledge Augmentation.