Measuring Moral Dimensions in Social Media with Mformer
Tuan Dung Nguyen, Ziyu Chen, Nicholas George Carroll, Alasdair Tran, Colin Klein, Lexing Xie
TL;DR
Measuring Moral Dimensions in Social Media with Mformer addresses the instability and domain dependence of lexicon-based moral detection by fine-tuning a RoBERTa model on diverse data sources. The authors introduce Mformer, a set of five binary classifiers for five moral foundations, and show that it outperforms word-count, embedding, and logistic baselines with AUC gains of 4-17% in-domain and strong cross-domain generalization to four external datasets. They validate Mformer through case studies on Reddit and Twitter, revealing topic- and stance-dependent patterns in moral rhetoric and highlighting differences from prior lexicon-based analyses. The work provides publicly released models and datasets, and argues that Mformer enables robust, reproducible quantification of moral dimensions across data domains, with implications for computational social science and automated content analysis.
Abstract
The ever-growing textual records of contemporary social issues, often discussed online with moral rhetoric, present both an opportunity and a challenge for studying how moral concerns are debated in real life. Moral foundations theory is a taxonomy of intuitions widely used in data-driven analyses of online content, but current computational tools to detect moral foundations suffer from the incompleteness and fragility of their lexicons and from poor generalization across data domains. In this paper, we fine-tune a large language model to measure moral foundations in text based on datasets covering news media and long- and short-form online discussions. The resulting model, called Mformer, outperforms existing approaches on the same domains by 4--12% in AUC and further generalizes well to four commonly used moral text datasets, improving by up to 17% in AUC. We present case studies using Mformer to analyze everyday moral dilemmas on Reddit and controversies on Twitter, showing that moral foundations can meaningfully describe people's stance on social issues and such variations are topic-dependent. Pre-trained model and datasets are released publicly. We posit that Mformer will help the research community quantify moral dimensions for a range of tasks and data domains, and eventually contribute to the understanding of moral situations faced by humans and machines.
