Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse

Rongchen Guo; Isar Nejadgholi; Hillary Dawkins; Kathleen C. Fraser; Svetlana Kiritchenko

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse

Rongchen Guo, Isar Nejadgholi, Hillary Dawkins, Kathleen C. Fraser, Svetlana Kiritchenko

TL;DR

This paper examines how eight large language models generate morally grounded explanations about implicit sexist content, using the Explainable Detection of Online Sexism (EDOS) dataset and grounding arguments in Moral Foundations Theory. Through both human and automatic evaluations, it shows that while LLMs can produce fluent, contextually relevant defenses and criticisms, they can also justify sexist views, revealing divergent moralReasoning patterns across models. The study highlights significant societal implications: LLMs can aid educators and moderators in understanding the roots of sexist beliefs and informing interventions, but they also pose risks of amplification or legitimization of sexism if misused. Safety, monitoring, and nuanced prompting are emphasized as necessary to harness potential educational benefits while mitigating harms in gendered discourse.

Abstract

This work provides an explanatory view of how LLMs can apply moral reasoning to both criticize and defend sexist language. We assessed eight large language models, all of which demonstrated the capability to provide explanations grounded in varying moral perspectives for both critiquing and endorsing views that reflect sexist assumptions. With both human and automatic evaluation, we show that all eight models produce comprehensible and contextually relevant text, which is helpful in understanding diverse views on how sexism is perceived. Also, through analysis of moral foundations cited by LLMs in their arguments, we uncover the diverse ideological perspectives in models' outputs, with some models aligning more with progressive or conservative views on gender roles and sexism. Based on our observations, we caution against the potential misuse of LLMs to justify sexist language. We also highlight that LLMs can serve as tools for understanding the roots of sexist beliefs and designing well-informed interventions. Given this dual capacity, it is crucial to monitor LLMs and design safety mechanisms for their use in applications that involve sensitive societal topics, such as sexism.

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse

TL;DR

Abstract

Paper Structure (18 sections, 4 figures, 9 tables)

This paper contains 18 sections, 4 figures, 9 tables.

Introduction
Methods
Dataset
LLM Selection and Prompt Engineering
Results
Detection of Implicit Sexism
Generation Quality Evaluation
Analysis of Cited Moral Foundations
Discussion
Related Work
Conclusion
Selected Language Models
Prompts for Applying MFT for Explanations
LLM Generation Parameters
Binary Classification of Sexist Language
...and 3 more sections

Figures (4)

Figure 1: Example of summarized explanations generated by LLMs. While the quality of the generations varies, the models reflect opposite perspectives, including harmful moral justifications of sexism. The full set of generated explanations is available at https://huggingface.co/datasets/mft-moral/edos-sup
Figure 2: Percentage of explanations that use each moral foundation. Blue and red represent criticizing and defending sexism, respectively.
Figure 3: Break down of moral value frequencies on each EDOS sub-category. Blue-ish and reddish heatmaps represent the cases of criticizing and defending the sentences, respectively.
Figure G.1: Occurrences of terms corresponding to the MFT dimensions in Zephyr's fine-tuning sets.

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse

TL;DR

Abstract

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse

Authors

TL;DR

Abstract

Table of Contents

Figures (4)