Table of Contents
Fetching ...

The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres

Maria Becker, Mirko Sommer, Lars Tapken, Yi Wan Teh, Bruno Brocai

TL;DR

The paper introduces the Moralization Corpus, a frame-based resource for analyzing how moral values, demands, and discourse protagonists structure moralizing speech across German genres. It details a multi-step annotation process, a three-layer annotation schema, and a data collection pipeline that yields 11,503 instances across seven genres, with rigorous evaluation of LLM prompting for moralization detection and component extraction. Experimental results show that detailed, task-specific prompts improve model performance modestly, but moralization remains highly subjective and context-dependent, highlighting gaps between automatic detection and human interpretation. The work provides a methodological foundation for cross-disciplinary studies in moral discourse and motivates future model tuning, broader multilingual annotation, and more fine-grained evaluation methods.

Abstract

Moralizations - arguments that invoke moral values to justify demands or positions - are a yet underexplored form of persuasive communication. We present the Moralization Corpus, a novel multi-genre dataset designed to analyze how moral values are strategically used in argumentative discourse. Moralizations are pragmatically complex and often implicit, posing significant challenges for both human annotators and NLP systems. We develop a frame-based annotation scheme that captures the constitutive elements of moralizations - moral values, demands, and discourse protagonists - and apply it to a diverse set of German texts, including political debates, news articles, and online discussions. The corpus enables fine-grained analysis of moralizing language across communicative formats and domains. We further evaluate several large language models (LLMs) under varied prompting conditions for the task of moralization detection and moralization component extraction and compare it to human annotations in order to investigate the challenges of automatic and manual analysis of moralizations. Results show that detailed prompt instructions has a greater effect than few-shot or explanation-based prompting, and that moralization remains a highly subjective and context-sensitive task. We release all data, annotation guidelines, and code to foster future interdisciplinary research on moral discourse and moral reasoning in NLP.

The Moralization Corpus: Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres

TL;DR

The paper introduces the Moralization Corpus, a frame-based resource for analyzing how moral values, demands, and discourse protagonists structure moralizing speech across German genres. It details a multi-step annotation process, a three-layer annotation schema, and a data collection pipeline that yields 11,503 instances across seven genres, with rigorous evaluation of LLM prompting for moralization detection and component extraction. Experimental results show that detailed, task-specific prompts improve model performance modestly, but moralization remains highly subjective and context-dependent, highlighting gaps between automatic detection and human interpretation. The work provides a methodological foundation for cross-disciplinary studies in moral discourse and motivates future model tuning, broader multilingual annotation, and more fine-grained evaluation methods.

Abstract

Moralizations - arguments that invoke moral values to justify demands or positions - are a yet underexplored form of persuasive communication. We present the Moralization Corpus, a novel multi-genre dataset designed to analyze how moral values are strategically used in argumentative discourse. Moralizations are pragmatically complex and often implicit, posing significant challenges for both human annotators and NLP systems. We develop a frame-based annotation scheme that captures the constitutive elements of moralizations - moral values, demands, and discourse protagonists - and apply it to a diverse set of German texts, including political debates, news articles, and online discussions. The corpus enables fine-grained analysis of moralizing language across communicative formats and domains. We further evaluate several large language models (LLMs) under varied prompting conditions for the task of moralization detection and moralization component extraction and compare it to human annotations in order to investigate the challenges of automatic and manual analysis of moralizations. Results show that detailed prompt instructions has a greater effect than few-shot or explanation-based prompting, and that moralization remains a highly subjective and context-sensitive task. We release all data, annotation guidelines, and code to foster future interdisciplinary research on moral discourse and moral reasoning in NLP.

Paper Structure

This paper contains 34 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Fully annotated example of a moralization frame, labeled with the demand, the supporting moral value and the protagonists (translation from German by the authors).
  • Figure 2: Binary moralization classification across models and prompting conditions (macro F1).
  • Figure 3: Mean PABAK Scores for different comparisons of agreement between and within human annotators and models. While Fleiss’ Kappa scores are on average 1–2 percentage points lower, they follow precisely the same tendencies as the other measures.