Table of Contents
Fetching ...

Fairness Metric Design Exploration in Multi-Domain Moral Sentiment Classification using Transformer-Based Models

Battemuulen Naranbat, Seyed Sahand Mohammadi Ziabari, Yousuf Nasser Al Husaini, Ali Mohammed Mansoor Alsahag

TL;DR

This work addresses fairness in transformer-based moral sentiment classification under cross-domain shifts by evaluating BERT and DistilBERT on two domain-specific datasets, MFTC and MFRC, in a multi-label setting. It reveals pronounced, asymmetric transfer gaps and per-label fairness disparities that aggregate metrics miss, notably for the Authority foundation. The authors introduce Moral Fairness Consistency (MFC), a cross-domain diagnostic that measures stability of moral foundation detection across platforms, and demonstrate its strong negative correlation with Demographic Parity Difference and Equalized Odds Difference, while remaining independent of standard performance metrics. The findings advocate for fairness-aware evaluation in moral NLP and position MFC as a practical tool to guide deployment across heterogeneous linguistic contexts and beyond this specific task.

Abstract

Ensuring fairness in natural language processing for moral sentiment classification is challenging, particularly under cross-domain shifts where transformer models are increasingly deployed. Using the Moral Foundations Twitter Corpus (MFTC) and Moral Foundations Reddit Corpus (MFRC), this work evaluates BERT and DistilBERT in a multi-label setting with in-domain and cross-domain protocols. Aggregate performance can mask disparities: we observe pronounced asymmetry in transfer, with Twitter->Reddit degrading micro-F1 by 14.9% versus only 1.5% for Reddit->Twitter. Per-label analysis reveals fairness violations hidden by overall scores; notably, the authority label exhibits Demographic Parity Differences of 0.22-0.23 and Equalized Odds Differences of 0.40-0.41. To address this gap, we introduce the Moral Fairness Consistency (MFC) metric, which quantifies the cross-domain stability of moral foundation detection. MFC shows strong empirical validity, achieving a perfect negative correlation with Demographic Parity Difference (rho = -1.000, p < 0.001) while remaining independent of standard performance metrics. Across labels, loyalty demonstrates the highest consistency (MFC = 0.96) and authority the lowest (MFC = 0.78). These findings establish MFC as a complementary, diagnosis-oriented metric for fairness-aware evaluation of moral reasoning models, enabling more reliable deployment across heterogeneous linguistic contexts. .

Fairness Metric Design Exploration in Multi-Domain Moral Sentiment Classification using Transformer-Based Models

TL;DR

This work addresses fairness in transformer-based moral sentiment classification under cross-domain shifts by evaluating BERT and DistilBERT on two domain-specific datasets, MFTC and MFRC, in a multi-label setting. It reveals pronounced, asymmetric transfer gaps and per-label fairness disparities that aggregate metrics miss, notably for the Authority foundation. The authors introduce Moral Fairness Consistency (MFC), a cross-domain diagnostic that measures stability of moral foundation detection across platforms, and demonstrate its strong negative correlation with Demographic Parity Difference and Equalized Odds Difference, while remaining independent of standard performance metrics. The findings advocate for fairness-aware evaluation in moral NLP and position MFC as a practical tool to guide deployment across heterogeneous linguistic contexts and beyond this specific task.

Abstract

Ensuring fairness in natural language processing for moral sentiment classification is challenging, particularly under cross-domain shifts where transformer models are increasingly deployed. Using the Moral Foundations Twitter Corpus (MFTC) and Moral Foundations Reddit Corpus (MFRC), this work evaluates BERT and DistilBERT in a multi-label setting with in-domain and cross-domain protocols. Aggregate performance can mask disparities: we observe pronounced asymmetry in transfer, with Twitter->Reddit degrading micro-F1 by 14.9% versus only 1.5% for Reddit->Twitter. Per-label analysis reveals fairness violations hidden by overall scores; notably, the authority label exhibits Demographic Parity Differences of 0.22-0.23 and Equalized Odds Differences of 0.40-0.41. To address this gap, we introduce the Moral Fairness Consistency (MFC) metric, which quantifies the cross-domain stability of moral foundation detection. MFC shows strong empirical validity, achieving a perfect negative correlation with Demographic Parity Difference (rho = -1.000, p < 0.001) while remaining independent of standard performance metrics. Across labels, loyalty demonstrates the highest consistency (MFC = 0.96) and authority the lowest (MFC = 0.78). These findings establish MFC as a complementary, diagnosis-oriented metric for fairness-aware evaluation of moral reasoning models, enabling more reliable deployment across heterogeneous linguistic contexts. .

Paper Structure

This paper contains 20 sections, 6 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Comparison of label distributions in the two corpora after label harmonization to 5 morality labels.
  • Figure 2: Methodology Pipeline
  • Figure 3: Original Moral Label Distribution of MFRC
  • Figure 4: Original Moral Label Distribution of MFTC
  • Figure 5: Individual Moral Label Distribution of MFTC
  • ...and 11 more figures