Table of Contents
Fetching ...

Continuous sentiment scores for literary and multilingual contexts

Laurits Lyngbaek, Pascale Feldkamp, Yuri Bizzoni, Kristoffer Nielbo, Kenneth Enevoldsen

TL;DR

This paper tackles the challenge of obtaining fine-grained, continuous sentiment scores for literary texts across languages and historical periods. It introduces Concept Vector Projection (CVP), a method that builds a unit sentiment vector in a multilingual embedding space from positive and negative exemplars and projects sentence embeddings onto this vector to yield a continuous score $s = \mathbf{e}_i \cdot \hat{\\mathbf{v}}$. Empirical results on English and Danish data (Fiction4) and EmoBank show that CVP outperforms dictionary-based tools and most transformer-based baselines, producing scores that more closely match human ratings and display desirable continuity across genres and time. The approach demonstrates robust cross-lingual and diachronic generalization and holds promise for extending to other abstract concepts or sentiment-related tasks in literary analytics.

Abstract

Sentiment Analysis is widely used to quantify sentiment in text, but its application to literary texts poses unique challenges due to figurative language, stylistic ambiguity, as well as sentiment evocation strategies. Traditional dictionary-based tools often underperform, especially for low-resource languages, and transformer models, while promising, typically output coarse categorical labels that limit fine-grained analysis. We introduce a novel continuous sentiment scoring method based on concept vector projection, trained on multilingual literary data, which more effectively captures nuanced sentiment expressions across genres, languages, and historical periods. Our approach outperforms existing tools on English and Danish texts, producing sentiment scores whose distribution closely matches human ratings, enabling more accurate analysis and sentiment arc modeling in literature.

Continuous sentiment scores for literary and multilingual contexts

TL;DR

This paper tackles the challenge of obtaining fine-grained, continuous sentiment scores for literary texts across languages and historical periods. It introduces Concept Vector Projection (CVP), a method that builds a unit sentiment vector in a multilingual embedding space from positive and negative exemplars and projects sentence embeddings onto this vector to yield a continuous score . Empirical results on English and Danish data (Fiction4) and EmoBank show that CVP outperforms dictionary-based tools and most transformer-based baselines, producing scores that more closely match human ratings and display desirable continuity across genres and time. The approach demonstrates robust cross-lingual and diachronic generalization and holds promise for extending to other abstract concepts or sentiment-related tasks in literary analytics.

Abstract

Sentiment Analysis is widely used to quantify sentiment in text, but its application to literary texts poses unique challenges due to figurative language, stylistic ambiguity, as well as sentiment evocation strategies. Traditional dictionary-based tools often underperform, especially for low-resource languages, and transformer models, while promising, typically output coarse categorical labels that limit fine-grained analysis. We introduce a novel continuous sentiment scoring method based on concept vector projection, trained on multilingual literary data, which more effectively captures nuanced sentiment expressions across genres, languages, and historical periods. Our approach outperforms existing tools on English and Danish texts, producing sentiment scores whose distribution closely matches human ratings, enabling more accurate analysis and sentiment arc modeling in literature.

Paper Structure

This paper contains 20 sections, 2 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: An overview of how a concept vector for sentiment is constructed and what information it contains. A circle represents an embedded document.
  • Figure 2: A visualization of how the Concept Vector Projection is constructed. It shows how to use a labeled sentiment corpus to predict sentiments of an unlabeled corpus of interest. The vectors shown are reduced to a two-dimensional Euclidean space for visualization, but normally reside in a high-dimensional space.
  • Figure 3: Scatterplot of Sentiment Predictions for respectively Sentiment Projection and xlm-roberta. While the xlm-roberta model, in theory, can predict a continuous space of sentiments when transforming it with confidence scores, inspection shows that certain ranges of the sentiments spectrum are not used. While both models achieve high correlations, it appears that xlm-roberta achieves this by matching human tendencies to predict neutral.
  • Figure 4: Scatterplot of Sentiment Projection xlm-roberta for EmoBank Data.
  • Figure 5: Histograms of respectively Human raters, sentiment projection model and xlm-roberta's predictions for the Fiction4 test-set. This plot should be interpreted in conjunction with \ref{['fig:scatterplots_corr']} and \ref{['fig:EmoBank_Scatter']}. It visualizes that the xlm-roberta model follows the human trend of predicting completely neutral sentences. The Sentiment Projection predicts mostly neutral sentences, as hoped, but follows a bell-curve that becomes visible in human ratings, as the number of raters increases, see \ref{['fig:EmoBank_Hist']}.
  • ...and 1 more figures