Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts
Elizabeth Schaefer, Kirk Roberts
TL;DR
The paper tackles gender bias in medical large language models by targeting occupational pronouns in PubMed abstracts. It introduces MOBERT, a BERT-based model trained on gender-neutralized abstracts produced via a three-stage pipeline (pronoun resolution, lexicon validation, and classification) applied to 379,000 abstracts from 1965–1980. MOBERT achieves a 70% inclusive pronoun replacement rate in a masked language modeling task, vastly outperforming 1965BERT (4%), with performance improvements linked to the frequency of occupational terms in training data. The work demonstrates the value of data-level bias mitigation for fairer medical NLP, offering a practical approach and a path for future improvements through dataset expansion and broader clinical evaluation.
Abstract
This paper presents a pipeline for mitigating gender bias in large language models (LLMs) used in medical literature by neutralizing gendered occupational pronouns. A dataset of 379,000 PubMed abstracts from 1965-1980 was processed to identify and modify pronouns tied to professions. We developed a BERT-based model, "Modern Occupational Bias Elimination with Refined Training," or "MOBERT," trained on these neutralized abstracts, and compared its performance with "1965BERT," trained on the original dataset. MOBERT achieved a 70% inclusive replacement rate, while 1965BERT reached only 4%. A further analysis of MOBERT revealed that pronoun replacement accuracy correlated with the frequency of occupational terms in the training data. We propose expanding the dataset and refining the pipeline to improve performance and ensure more equitable language modeling in medical applications.
