Continuous sentiment scores for literary and multilingual contexts
Laurits Lyngbaek, Pascale Feldkamp, Yuri Bizzoni, Kristoffer Nielbo, Kenneth Enevoldsen
TL;DR
This paper tackles the challenge of obtaining fine-grained, continuous sentiment scores for literary texts across languages and historical periods. It introduces Concept Vector Projection (CVP), a method that builds a unit sentiment vector in a multilingual embedding space from positive and negative exemplars and projects sentence embeddings onto this vector to yield a continuous score $s = \mathbf{e}_i \cdot \hat{\\mathbf{v}}$. Empirical results on English and Danish data (Fiction4) and EmoBank show that CVP outperforms dictionary-based tools and most transformer-based baselines, producing scores that more closely match human ratings and display desirable continuity across genres and time. The approach demonstrates robust cross-lingual and diachronic generalization and holds promise for extending to other abstract concepts or sentiment-related tasks in literary analytics.
Abstract
Sentiment Analysis is widely used to quantify sentiment in text, but its application to literary texts poses unique challenges due to figurative language, stylistic ambiguity, as well as sentiment evocation strategies. Traditional dictionary-based tools often underperform, especially for low-resource languages, and transformer models, while promising, typically output coarse categorical labels that limit fine-grained analysis. We introduce a novel continuous sentiment scoring method based on concept vector projection, trained on multilingual literary data, which more effectively captures nuanced sentiment expressions across genres, languages, and historical periods. Our approach outperforms existing tools on English and Danish texts, producing sentiment scores whose distribution closely matches human ratings, enabling more accurate analysis and sentiment arc modeling in literature.
