Table of Contents
Fetching ...

Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English

Dipankar Srirag, Jordan Painter, Aditya Joshi, Diptesh Kanojia

TL;DR

The ongoing project of building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English is shared, revealing significant performance variations influenced by sample characteristics, label semantics, and language variety.

Abstract

Existing benchmarks often fail to account for linguistic diversity, like language variants of English. In this paper, we share our experiences from our ongoing project of building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English. Using Google Places reviews, we explore the effects of various sampling techniques based on label semantics, review length, and sentiment proportion and report performances on three fine-tuned BERT-based models. Our initial evaluation reveals significant performance variations influenced by sample characteristics, label semantics, and language variety, highlighting the need for nuanced benchmark design. We offer actionable insights for researchers to create robust benchmarks, emphasising the importance of diverse sampling, careful label definition, and comprehensive evaluation across linguistic varieties.

Experiences from Creating a Benchmark for Sentiment Classification for Varieties of English

TL;DR

The ongoing project of building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English is shared, revealing significant performance variations influenced by sample characteristics, label semantics, and language variety.

Abstract

Existing benchmarks often fail to account for linguistic diversity, like language variants of English. In this paper, we share our experiences from our ongoing project of building a sentiment classification benchmark for three variants of English: Australian (en-AU), Indian (en-IN), and British (en-UK) English. Using Google Places reviews, we explore the effects of various sampling techniques based on label semantics, review length, and sentiment proportion and report performances on three fine-tuned BERT-based models. Our initial evaluation reveals significant performance variations influenced by sample characteristics, label semantics, and language variety, highlighting the need for nuanced benchmark design. We offer actionable insights for researchers to create robust benchmarks, emphasising the importance of diverse sampling, careful label definition, and comprehensive evaluation across linguistic varieties.

Paper Structure

This paper contains 24 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Data Curation and Sampling for Sentiment Classification for varieties of English.
  • Figure 2: Confusion Matrix of label annotations by ant-IN and ant-UK.