Table of Contents
Fetching ...

Measuring Moral Dimensions in Social Media with Mformer

Tuan Dung Nguyen, Ziyu Chen, Nicholas George Carroll, Alasdair Tran, Colin Klein, Lexing Xie

TL;DR

Measuring Moral Dimensions in Social Media with Mformer addresses the instability and domain dependence of lexicon-based moral detection by fine-tuning a RoBERTa model on diverse data sources. The authors introduce Mformer, a set of five binary classifiers for five moral foundations, and show that it outperforms word-count, embedding, and logistic baselines with AUC gains of 4-17% in-domain and strong cross-domain generalization to four external datasets. They validate Mformer through case studies on Reddit and Twitter, revealing topic- and stance-dependent patterns in moral rhetoric and highlighting differences from prior lexicon-based analyses. The work provides publicly released models and datasets, and argues that Mformer enables robust, reproducible quantification of moral dimensions across data domains, with implications for computational social science and automated content analysis.

Abstract

The ever-growing textual records of contemporary social issues, often discussed online with moral rhetoric, present both an opportunity and a challenge for studying how moral concerns are debated in real life. Moral foundations theory is a taxonomy of intuitions widely used in data-driven analyses of online content, but current computational tools to detect moral foundations suffer from the incompleteness and fragility of their lexicons and from poor generalization across data domains. In this paper, we fine-tune a large language model to measure moral foundations in text based on datasets covering news media and long- and short-form online discussions. The resulting model, called Mformer, outperforms existing approaches on the same domains by 4--12% in AUC and further generalizes well to four commonly used moral text datasets, improving by up to 17% in AUC. We present case studies using Mformer to analyze everyday moral dilemmas on Reddit and controversies on Twitter, showing that moral foundations can meaningfully describe people's stance on social issues and such variations are topic-dependent. Pre-trained model and datasets are released publicly. We posit that Mformer will help the research community quantify moral dimensions for a range of tasks and data domains, and eventually contribute to the understanding of moral situations faced by humans and machines.

Measuring Moral Dimensions in Social Media with Mformer

TL;DR

Measuring Moral Dimensions in Social Media with Mformer addresses the instability and domain dependence of lexicon-based moral detection by fine-tuning a RoBERTa model on diverse data sources. The authors introduce Mformer, a set of five binary classifiers for five moral foundations, and show that it outperforms word-count, embedding, and logistic baselines with AUC gains of 4-17% in-domain and strong cross-domain generalization to four external datasets. They validate Mformer through case studies on Reddit and Twitter, revealing topic- and stance-dependent patterns in moral rhetoric and highlighting differences from prior lexicon-based analyses. The work provides publicly released models and datasets, and argues that Mformer enables robust, reproducible quantification of moral dimensions across data domains, with implications for computational social science and automated content analysis.

Abstract

The ever-growing textual records of contemporary social issues, often discussed online with moral rhetoric, present both an opportunity and a challenge for studying how moral concerns are debated in real life. Moral foundations theory is a taxonomy of intuitions widely used in data-driven analyses of online content, but current computational tools to detect moral foundations suffer from the incompleteness and fragility of their lexicons and from poor generalization across data domains. In this paper, we fine-tune a large language model to measure moral foundations in text based on datasets covering news media and long- and short-form online discussions. The resulting model, called Mformer, outperforms existing approaches on the same domains by 4--12% in AUC and further generalizes well to four commonly used moral text datasets, improving by up to 17% in AUC. We present case studies using Mformer to analyze everyday moral dilemmas on Reddit and controversies on Twitter, showing that moral foundations can meaningfully describe people's stance on social issues and such variations are topic-dependent. Pre-trained model and datasets are released publicly. We posit that Mformer will help the research community quantify moral dimensions for a range of tasks and data domains, and eventually contribute to the understanding of moral situations faced by humans and machines.
Paper Structure (64 sections, 7 equations, 16 figures, 13 tables)

This paper contains 64 sections, 7 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Three existing lexicons---MFD, MFD 2.0, and eMFD---used for word count in detecting moral foundations. Left: Venn diagram depicting the sizes of these lexicons with example words. Right: 10 most popular words for two moral foundations (authority and care) in each lexicon that are found in 6,800 r/AmItheAsshole posts of the topic family. See \ref{['sec:wordcount_limits']}.
  • Figure 2: AUC on the Twitter (top row), news (middle row) Reddit (bottom row) portions of the test set for five moral foundation scoring methods: MFD, MFD 2.0, eMFD, embedding similarity, logistic regression, and Mformer.
  • Figure 3: AUC on four external datasets for six moral foundations scoring methods: MFD, MFD 2.0, eMFD, embedding similarity, logistic regression, and Mformer.
  • Figure 4: Posts (top) and verdicts (bottom) in the (family, marriage) topic pair on AITA. Each number in a radar plot indicates the proportion of posts (or verdicts) that contain the corresponding moral foundation. The moral foundations are detected by two methods: MFD 2.0 and Mformer. Red (resp. blue) indicates negative (resp. positive) verdicts.
  • Figure 5: Example controversial thread on AITA. Top left: the post, including its title in bold and body text. Top right: odds ratios and 95% CIs between the presence of a moral foundation in a judgment and the judgment's valence. Values above the dashed horizontal line indicate that a foundation is associated with positive valence (i.e., "NTA" or "NAH"), while values below the dashed line indicate an association with a negative (i.e., "YTA" or "ESH") judgment. Bottom: three judgments for this post. The foundations contained in each judgment are annotated at the top.
  • ...and 11 more figures