Table of Contents
Fetching ...

Emotion Identification for French in Written Texts: Considering their Modes of Expression as a Step Towards Text Complexity Analysis

Aline Étienne, Delphine Battistelli, Gwénolé Lecorvé

TL;DR

This work addresses emotion identification in non-conversational French texts by introducing a mode of expression as a factor of text complexity. It presents a psycho-linguistically grounded dataset and a CamemBERT-based multi-task model that jointly predicts presence, expression mode, basic/complex type, and emotional category (A–D). The approach outperforms traditional baselines and GPT-3.5, with human judgments aligning closely to automatic predictions, especially for mode identification. The study enables automatic analysis of text complexity at scale and suggests future directions for intra-sentential prediction, broader notion integration, and larger-model leaderboards to further enhance applicability to complexity analysis.

Abstract

The objective of this paper is to predict (A) whether a sentence in a written text expresses an emotion, (B) the mode(s) in which it is expressed, (C) whether it is basic or complex, and (D) its emotional category. One of our major contributions, through a dataset and a model, is to integrate the fact that an emotion can be expressed in different modes: from a direct mode, essentially lexicalized, to a more indirect mode, where emotions will only be suggested, a mode that NLP approaches generally don't take into account. Another originality is that the scope is on written texts, as opposed usual work focusing on conversational (often multi-modal) data. In this context, modes of expression are seen as a factor towards the automatic analysis of complexity in texts. Experiments on French texts show acceptable results compared to the human annotators' agreement, and outperforming results compared to using a large language model with in-context learning (i.e. no fine-tuning).

Emotion Identification for French in Written Texts: Considering their Modes of Expression as a Step Towards Text Complexity Analysis

TL;DR

This work addresses emotion identification in non-conversational French texts by introducing a mode of expression as a factor of text complexity. It presents a psycho-linguistically grounded dataset and a CamemBERT-based multi-task model that jointly predicts presence, expression mode, basic/complex type, and emotional category (A–D). The approach outperforms traditional baselines and GPT-3.5, with human judgments aligning closely to automatic predictions, especially for mode identification. The study enables automatic analysis of text complexity at scale and suggests future directions for intra-sentential prediction, broader notion integration, and larger-model leaderboards to further enhance applicability to complexity analysis.

Abstract

The objective of this paper is to predict (A) whether a sentence in a written text expresses an emotion, (B) the mode(s) in which it is expressed, (C) whether it is basic or complex, and (D) its emotional category. One of our major contributions, through a dataset and a model, is to integrate the fact that an emotion can be expressed in different modes: from a direct mode, essentially lexicalized, to a more indirect mode, where emotions will only be suggested, a mode that NLP approaches generally don't take into account. Another originality is that the scope is on written texts, as opposed usual work focusing on conversational (often multi-modal) data. In this context, modes of expression are seen as a factor towards the automatic analysis of complexity in texts. Experiments on French texts show acceptable results compared to the human annotators' agreement, and outperforming results compared to using a large language model with in-context learning (i.e. no fine-tuning).
Paper Structure (20 sections, 12 tables)