Table of Contents
Fetching ...

CLIMATELI: Evaluating Entity Linking on Climate Change Data

Shijia Zhou, Siyao Peng, Barbara Plank

TL;DR

CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia is presented, and automated filtering methods for CC entities are proposed.

Abstract

Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models' performances.

CLIMATELI: Evaluating Entity Linking on Climate Change Data

TL;DR

CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia is presented, and automated filtering methods for CC entities are proposed.

Abstract

Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models' performances.

Paper Structure

This paper contains 20 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Sample ClimatELi annotation.
  • Figure 2: Typed entity precision, recall, and F1 of Wikifier and TagMe on thresholds 0.1 to 1.0 (min to max recall).
  • Figure 3: NC-only entity length distributions per genre.