Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning
Jeongwoo Park, Enrico Liscio, Pradeep K. Murukannaiah
TL;DR
This paper argues that morality should be treated as a pluralist construct rather than a binary right/wrong. It presents a contrastive-learning framework (SimCSE) to build a pluralist moral sentence-embedding space aligned with Moral Foundations Theory (MFT) using the Moral Foundations Twitter Corpus (MFTC). Through intrinsic analyses (visualization and moral similarity) and extrinsic evaluations (generalization to a test set and alignment with MFD2.0), the study shows that supervised, label-aware training is necessary to disentangle MFT elements and reveal their relationships. The resulting embeddings exhibit meaningful virtue–vice structure and map closely to an independent moral lexicon, signaling potential for cross-task applicability while highlighting the limits of self-supervision alone. This work lays groundwork for more nuanced moral reasoning in language models and points to future work on broader datasets and cultural considerations.
Abstract
Recent advances in NLP show that language models retain a discernible level of knowledge in deontological ethics and moral norms. However, existing works often treat morality as binary, ranging from right to wrong. This simplistic view does not capture the nuances of moral judgment. Pluralist moral philosophers argue that human morality can be deconstructed into a finite number of elements, respecting individual differences in moral judgment. In line with this view, we build a pluralist moral sentence embedding space via a state-of-the-art contrastive learning approach. We systematically investigate the embedding space by studying the emergence of relationships among moral elements, both quantitatively and qualitatively. Our results show that a pluralist approach to morality can be captured in an embedding space. However, moral pluralism is challenging to deduce via self-supervision alone and requires a supervised approach with human labels.
