Table of Contents
Fetching ...

MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions

Vjosa Preniqi, Iacopo Ghinassi, Julia Ive, Charalampos Saitis, Kyriaki Kalimeri

TL;DR

This work tackles automatic moral-value detection in social media text using Moral Foundations Theory (MFT) across multiple platforms. It introduces MoralBERT, a set of transformer models fine-tuned on three heterogeneous, manually annotated corpora (MFTC on Twitter, MFRC on Reddit, and FB) with both aggregated and domain-adversarial training; the adversarial variant uses a gradient reversal layer and dual objectives with losses $L_{moral}$, $L_{dom}$ and regularizers $L_{norm}$ and $L_{rec}$. Compared against MoralStrength, Word2Vec+RF, and GPT-4 zero-shot, MoralBERT_adv delivers higher in-domain F1 by approximately 11–32% and shows improved out-of-domain generalization, though gains are more modest outside the training domains. The work demonstrates a resource-efficient, interpretable alternative to large LLMs for analyzing moral narratives in controversial debates and paves the way for multilingual extensions and synthetic-data augmentation to broaden coverage and reduce annotation needs.

Abstract

Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. Controversial topics, including vaccination, abortion, racism, and sexual orientation, often elicit opinions and attitudes that are not solely based on evidence but rather reflect moral worldviews. Recent advances in Natural Language Processing (NLP) show that moral values can be gauged in human-generated textual content. Building on the Moral Foundations Theory (MFT), this paper introduces MoralBERT, a range of language representation models fine-tuned to capture moral sentiment in social discourse. We describe a framework for both aggregated and domain-adversarial training on multiple heterogeneous MFT human-annotated datasets sourced from Twitter (now X), Reddit, and Facebook that broaden textual content diversity in terms of social media audience interests, content presentation and style, and spreading patterns. We show that the proposed framework achieves an average F1 score that is between 11% and 32% higher than lexicon-based approaches, Word2Vec embeddings, and zero-shot classification with large language models such as GPT-4 for in-domain inference. Domain-adversarial training yields better out-of domain predictions than aggregate training while achieving comparable performance to zero-shot learning. Our approach contributes to annotation-free and effective morality learning, and provides useful insights towards a more comprehensive understanding of moral narratives in controversial social debates using NLP.

MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions

TL;DR

This work tackles automatic moral-value detection in social media text using Moral Foundations Theory (MFT) across multiple platforms. It introduces MoralBERT, a set of transformer models fine-tuned on three heterogeneous, manually annotated corpora (MFTC on Twitter, MFRC on Reddit, and FB) with both aggregated and domain-adversarial training; the adversarial variant uses a gradient reversal layer and dual objectives with losses , and regularizers and . Compared against MoralStrength, Word2Vec+RF, and GPT-4 zero-shot, MoralBERT_adv delivers higher in-domain F1 by approximately 11–32% and shows improved out-of-domain generalization, though gains are more modest outside the training domains. The work demonstrates a resource-efficient, interpretable alternative to large LLMs for analyzing moral narratives in controversial debates and paves the way for multilingual extensions and synthetic-data augmentation to broaden coverage and reduce annotation needs.

Abstract

Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. Controversial topics, including vaccination, abortion, racism, and sexual orientation, often elicit opinions and attitudes that are not solely based on evidence but rather reflect moral worldviews. Recent advances in Natural Language Processing (NLP) show that moral values can be gauged in human-generated textual content. Building on the Moral Foundations Theory (MFT), this paper introduces MoralBERT, a range of language representation models fine-tuned to capture moral sentiment in social discourse. We describe a framework for both aggregated and domain-adversarial training on multiple heterogeneous MFT human-annotated datasets sourced from Twitter (now X), Reddit, and Facebook that broaden textual content diversity in terms of social media audience interests, content presentation and style, and spreading patterns. We show that the proposed framework achieves an average F1 score that is between 11% and 32% higher than lexicon-based approaches, Word2Vec embeddings, and zero-shot classification with large language models such as GPT-4 for in-domain inference. Domain-adversarial training yields better out-of domain predictions than aggregate training while achieving comparable performance to zero-shot learning. Our approach contributes to annotation-free and effective morality learning, and provides useful insights towards a more comprehensive understanding of moral narratives in controversial social debates using NLP.
Paper Structure (7 sections, 2 figures, 5 tables)

This paper contains 7 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: UMAP visualisation of feature distributions across datasets (domains). The left graph includes both moral and non-moral (neutral) labelled text, the right graph includes only moral labelled text. Dots represent mean-pooled BERT embeddings for each text example.
  • Figure 2: Out-of-domain classification: for each test dataset, models are fine-tuned on the other two datasets. Bar heights represent F1 Binary and Macro average scores; error bars indicate standard deviation estimated via 1,000 bootstraps.