Table of Contents
Fetching ...

A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

Xingmeng Zhao, Ali Niazi, Anthony Rios

TL;DR

This work investigates gender bias in chemical named entity recognition by combining a synthetic template framework with a real-world Reddit-derived dataset containing self-identified gender information. It evaluates multiple embedding regimes (Word2Vec, Flair, and (Bio)BERT) across standard chemical NER corpora (CDR, CHEMDNER, CHEBI) and two novel data sources, measuring bias via precision, recall, and F1 differences between male- and female-associated data. Key findings show that female-name patterns can be misread as chemical entities in synthetic data, and real-world data reveal recall biases against female-related content, with significant variance across models and datasets. The paper emphasizes the need for bias-aware model selection, data curation, and mitigation strategies to ensure fairer downstream biomedical NLP applications, including ADR and pharmacovigilance tasks.

Abstract

Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a framework for measuring gender bias in chemical NER models using synthetic data and a newly annotated corpus of over 92,405 words with self-identified gender information from Reddit. Our evaluation of multiple biomedical NER models reveals evident biases. For instance, synthetic data suggests female-related names are frequently misclassified as chemicals, especially for brand name mentions. Additionally, we observe performance disparities between female- and male-associated data in both datasets. Many systems fail to detect contraceptives such as birth control. Our findings emphasize the biases in chemical NER models, urging practitioners to account for these biases in downstream applications.

A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

TL;DR

This work investigates gender bias in chemical named entity recognition by combining a synthetic template framework with a real-world Reddit-derived dataset containing self-identified gender information. It evaluates multiple embedding regimes (Word2Vec, Flair, and (Bio)BERT) across standard chemical NER corpora (CDR, CHEMDNER, CHEBI) and two novel data sources, measuring bias via precision, recall, and F1 differences between male- and female-associated data. Key findings show that female-name patterns can be misread as chemical entities in synthetic data, and real-world data reveal recall biases against female-related content, with significant variance across models and datasets. The paper emphasizes the need for bias-aware model selection, data curation, and mitigation strategies to ensure fairer downstream biomedical NLP applications, including ADR and pharmacovigilance tasks.

Abstract

Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a framework for measuring gender bias in chemical NER models using synthetic data and a newly annotated corpus of over 92,405 words with self-identified gender information from Reddit. Our evaluation of multiple biomedical NER models reveals evident biases. For instance, synthetic data suggests female-related names are frequently misclassified as chemicals, especially for brand name mentions. Additionally, we observe performance disparities between female- and male-associated data in both datasets. Many systems fail to detect contraceptives such as birth control. Our findings emphasize the biases in chemical NER models, urging practitioners to account for these biases in downstream applications.
Paper Structure (25 sections, 2 equations, 1 figure, 7 tables)

This paper contains 25 sections, 2 equations, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Ratio of false negatives for various drug categories. The ratio is represented next to each bar. For female-leaning errors, the female false negative count ($FN^k_f$) is in the numerator. For male-leaning errors, the male false negative count ($FN^k_m$) is in the numerator.