Table of Contents
Fetching ...

Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

Naseela Pervez, Alexander J. Titus

TL;DR

The paper evaluates how well three prominent LLMs rewrite scientific abstracts without distorting authorial narrative style or introducing gender bias. Using the CORE dataset and LIWC-22, the study analyzes lexical, psychological, and social features, comparing AI-generated abstracts to human-written ones via Pearson correlation and two-sample t-tests. Results show strong diagonal correlations indicating substantial alignment with human style, but the analyses also uncover gender gaps that LLMs often amplify in affect, politeness, and motivation-related words. The findings highlight the need for inclusive LLM training and deployment and offer a reproducible framework for assessing writing personality and gender bias in scientific contexts.

Abstract

Large language models (LLMs) are increasingly utilized to assist in scientific and academic writing, helping authors enhance the coherence of their articles. Previous studies have highlighted stereotypes and biases present in LLM outputs, emphasizing the need to evaluate these models for their alignment with human narrative styles and potential gender biases. In this study, we assess the alignment of three prominent LLMs - Claude 3 Opus, Mistral AI Large, and Gemini 1.5 Flash - by analyzing their performance on benchmark text-generation tasks for scientific abstracts. We employ the Linguistic Inquiry and Word Count (LIWC) framework to extract lexical, psychological, and social features from the generated texts. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases. This research highlights the importance of developing LLMs that maintain a diversity of writing styles to promote inclusivity in academic discourse.

Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

TL;DR

The paper evaluates how well three prominent LLMs rewrite scientific abstracts without distorting authorial narrative style or introducing gender bias. Using the CORE dataset and LIWC-22, the study analyzes lexical, psychological, and social features, comparing AI-generated abstracts to human-written ones via Pearson correlation and two-sample t-tests. Results show strong diagonal correlations indicating substantial alignment with human style, but the analyses also uncover gender gaps that LLMs often amplify in affect, politeness, and motivation-related words. The findings highlight the need for inclusive LLM training and deployment and offer a reproducible framework for assessing writing personality and gender bias in scientific contexts.

Abstract

Large language models (LLMs) are increasingly utilized to assist in scientific and academic writing, helping authors enhance the coherence of their articles. Previous studies have highlighted stereotypes and biases present in LLM outputs, emphasizing the need to evaluate these models for their alignment with human narrative styles and potential gender biases. In this study, we assess the alignment of three prominent LLMs - Claude 3 Opus, Mistral AI Large, and Gemini 1.5 Flash - by analyzing their performance on benchmark text-generation tasks for scientific abstracts. We employ the Linguistic Inquiry and Word Count (LIWC) framework to extract lexical, psychological, and social features from the generated texts. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases. This research highlights the importance of developing LLMs that maintain a diversity of writing styles to promote inclusivity in academic discourse.
Paper Structure (13 sections, 4 figures, 3 tables)

This paper contains 13 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Flowchart illustrating the comparison of LIWC features in scientific abstracts written by humans and rewritten by LLMs to assess personality alignment. This framework is adapted for male vs female comparison as well.
  • Figure 2: Heatmap representing the pearson correlation coefficient of LIWC features between humans and LLMs - Claude, Gemini, Mistral (left to right)
  • Figure 3: Heatmap representing the significance (p_value) of pearson correlation coefficient of LIWC features between humans and LLMs - Claude, Gemini, Mistral (left to right)
  • Figure 4: t-statistic values of statistically significant features ('Tone', 'achieve', 'cause', 'emotion', 'emo_pos','polite','curiosity') which reflects gender gaps between human and LLM texts.