MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions
Vjosa Preniqi, Iacopo Ghinassi, Julia Ive, Charalampos Saitis, Kyriaki Kalimeri
TL;DR
This work tackles automatic moral-value detection in social media text using Moral Foundations Theory (MFT) across multiple platforms. It introduces MoralBERT, a set of transformer models fine-tuned on three heterogeneous, manually annotated corpora (MFTC on Twitter, MFRC on Reddit, and FB) with both aggregated and domain-adversarial training; the adversarial variant uses a gradient reversal layer and dual objectives with losses $L_{moral}$, $L_{dom}$ and regularizers $L_{norm}$ and $L_{rec}$. Compared against MoralStrength, Word2Vec+RF, and GPT-4 zero-shot, MoralBERT_adv delivers higher in-domain F1 by approximately 11–32% and shows improved out-of-domain generalization, though gains are more modest outside the training domains. The work demonstrates a resource-efficient, interpretable alternative to large LLMs for analyzing moral narratives in controversial debates and paves the way for multilingual extensions and synthetic-data augmentation to broaden coverage and reduce annotation needs.
Abstract
Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues. Controversial topics, including vaccination, abortion, racism, and sexual orientation, often elicit opinions and attitudes that are not solely based on evidence but rather reflect moral worldviews. Recent advances in Natural Language Processing (NLP) show that moral values can be gauged in human-generated textual content. Building on the Moral Foundations Theory (MFT), this paper introduces MoralBERT, a range of language representation models fine-tuned to capture moral sentiment in social discourse. We describe a framework for both aggregated and domain-adversarial training on multiple heterogeneous MFT human-annotated datasets sourced from Twitter (now X), Reddit, and Facebook that broaden textual content diversity in terms of social media audience interests, content presentation and style, and spreading patterns. We show that the proposed framework achieves an average F1 score that is between 11% and 32% higher than lexicon-based approaches, Word2Vec embeddings, and zero-shot classification with large language models such as GPT-4 for in-domain inference. Domain-adversarial training yields better out-of domain predictions than aggregate training while achieving comparable performance to zero-shot learning. Our approach contributes to annotation-free and effective morality learning, and provides useful insights towards a more comprehensive understanding of moral narratives in controversial social debates using NLP.
