Table of Contents
Fetching ...

EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection

Francesca Grasso, Stefano Locci, Giovanni Siragusa, Luigi Di Caro

TL;DR

EcoVerse addresses the need for NLP resources that span environmental topics beyond climate change by introducing a 3,023-tweet English Twitter dataset annotated with Eco-Relevance, Environmental Impact Analysis, and Stance Detection. The authors diseñed and applied a three-level annotation scheme, achieving high inter-annotator agreement after iterative reconciliation, and demonstrated baseline performance using six transformer models including ClimateBERT, RoBERTa, and DistilRoBERTa. Key findings show DistilRoBERTa and RoBERTa performing best on eco-relevance, while ClimateBERT variants excel less, and stance detection benefits from general-purpose models; a climate-scam hashtag bias analysis reveals its influence on classification. The dataset and baseline results, along with the ethical CO2 accounting, position EcoVerse as a valuable resource for policy analysis, awareness campaigns, and further research into diverse environmental narratives and language models tailored to ecological discourse.

Abstract

Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research.

EcoVerse: An Annotated Twitter Dataset for Eco-Relevance Classification, Environmental Impact Analysis, and Stance Detection

TL;DR

EcoVerse addresses the need for NLP resources that span environmental topics beyond climate change by introducing a 3,023-tweet English Twitter dataset annotated with Eco-Relevance, Environmental Impact Analysis, and Stance Detection. The authors diseñed and applied a three-level annotation scheme, achieving high inter-annotator agreement after iterative reconciliation, and demonstrated baseline performance using six transformer models including ClimateBERT, RoBERTa, and DistilRoBERTa. Key findings show DistilRoBERTa and RoBERTa performing best on eco-relevance, while ClimateBERT variants excel less, and stance detection benefits from general-purpose models; a climate-scam hashtag bias analysis reveals its influence on classification. The dataset and baseline results, along with the ethical CO2 accounting, position EcoVerse as a valuable resource for policy analysis, awareness campaigns, and further research into diverse environmental narratives and language models tailored to ecological discourse.

Abstract

Anthropogenic ecological crisis constitutes a significant challenge that all within the academy must urgently face, including the Natural Language Processing (NLP) community. While recent years have seen increasing work revolving around climate-centric discourse, crucial environmental and ecological topics outside of climate change remain largely unaddressed, despite their prominent importance. Mainstream NLP tasks, such as sentiment analysis, dominate the scene, but there remains an untouched space in the literature involving the analysis of environmental impacts of certain events and practices. To address this gap, this paper presents EcoVerse, an annotated English Twitter dataset of 3,023 tweets spanning a wide spectrum of environmental topics. We propose a three-level annotation scheme designed for Eco-Relevance Classification, Stance Detection, and introducing an original approach for Environmental Impact Analysis. We detail the data collection, filtering, and labeling process that led to the creation of the dataset. Remarkable Inter-Annotator Agreement indicates that the annotation scheme produces consistent annotations of high quality. Subsequent classification experiments using BERT-based models, including ClimateBERT, are presented. These yield encouraging results, while also indicating room for a model specifically tailored for environmental texts. The dataset is made freely available to stimulate further research.
Paper Structure (24 sections, 1 figure, 10 tables)

This paper contains 24 sections, 1 figure, 10 tables.

Figures (1)

  • Figure 1: The annotation begins with tweet text analysis, assessing its ecological relevance. If the tweet is categorized as eco-related, subsequent steps involve Environmental Impact Analysis and Stance Detection. Conversely, for tweets deemed not eco-related, annotations are finalized and move to the next tweet.