Table of Contents
Fetching ...

You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools

Daniel Baumartz, Mevlüt Bagci, Alexander Henlein, Maxim Konca, Andy Lücking, Alexander Mehler

TL;DR

Going beyond previous studies, it is shown that the sentiment tool used for sentiment annotation can even be predicted from its outcome, revealing an algorithmic bias of sentiment analysis.

Abstract

If sentiment analysis tools were valid classifiers, one would expect them to provide comparable results for sentiment classification on different kinds of corpora and for different languages. In line with results of previous studies we show that sentiment analysis tools disagree on the same dataset. Going beyond previous studies we show that the sentiment tool used for sentiment annotation can even be predicted from its outcome, revealing an algorithmic bias of sentiment analysis. Based on Twitter, Wikipedia and different news corpora from the English, German and French languages, our classifiers separate sentiment tools with an averaged F1-score of 0.89 (for the English corpora). We therefore warn against taking sentiment annotations as face value and argue for the need of more and systematic NLP evaluation studies.

You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools

TL;DR

Going beyond previous studies, it is shown that the sentiment tool used for sentiment annotation can even be predicted from its outcome, revealing an algorithmic bias of sentiment analysis.

Abstract

If sentiment analysis tools were valid classifiers, one would expect them to provide comparable results for sentiment classification on different kinds of corpora and for different languages. In line with results of previous studies we show that sentiment analysis tools disagree on the same dataset. Going beyond previous studies we show that the sentiment tool used for sentiment annotation can even be predicted from its outcome, revealing an algorithmic bias of sentiment analysis. Based on Twitter, Wikipedia and different news corpora from the English, German and French languages, our classifiers separate sentiment tools with an averaged F1-score of 0.89 (for the English corpora). We therefore warn against taking sentiment annotations as face value and argue for the need of more and systematic NLP evaluation studies.

Paper Structure

This paper contains 12 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Sentiments of English Wikipedia and Twitter hashtags #allenichtganzdicht and #SputnikV.
  • Figure 2: Distance correlation of sentiment scores for different tools on the EN C3 corpora, with darker color indicating higher correlation.
  • Figure 3: Per-tool rate of agreement with majority vote for the EN C3 corpus using 4 normalization methods.
  • Figure 4: Results of the NN-based classifier: mean $F_1$-scores over all languages and corpora, averaged over all chunk sizes and then over all normalization variants.
  • Figure 5: $F_1$-scores of the SVM classifier for the C3 and Europarl corpora. The x-axis indicates the number of features, which are selected randomly for each feature vector size independently.
  • ...and 7 more figures