Table of Contents
Fetching ...

Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPT

Harishwar Reddy, Madhusudan Srinivasan, Upulee Kanewala

TL;DR

This work addresses fairness evaluation in large language models by applying metamorphic testing with dedicated metamorphic relations to identify intersectional bias in LLaMA 3 and GPT-4. The authors generate source and follow-up test cases with varied sensitive attributes and analyze outputs for sentiment and tone to detect bias patterns, showing tone-based measures are more sensitive to fairness faults than sentiment. They report that certain metamorphic relations (MR4, MR2, MR17) consistently reveal biases and highlight interactions among attributes like religion, occupation, and economic status. The study provides a structured framework for fairness testing in LLMs, supporting robust deployment of LLMs in sensitive domains. It suggests extending MT-based fairness testing to additional architectures and broader fairness metrics.

Abstract

Large Language Models (LLMs) have made significant strides in Natural Language Processing but remain vulnerable to fairness-related issues, often reflecting biases inherent in their training data. These biases pose risks, particularly when LLMs are deployed in sensitive areas such as healthcare, finance, and law. This paper introduces a metamorphic testing approach to systematically identify fairness bugs in LLMs. We define and apply a set of fairness-oriented metamorphic relations (MRs) to assess the LLaMA and GPT model, a state-of-the-art LLM, across diverse demographic inputs. Our methodology includes generating source and follow-up test cases for each MR and analyzing model responses for fairness violations. The results demonstrate the effectiveness of MT in exposing bias patterns, especially in relation to tone and sentiment, and highlight specific intersections of sensitive attributes that frequently reveal fairness faults. This research improves fairness testing in LLMs, providing a structured approach to detect and mitigate biases and improve model robustness in fairness-sensitive applications.

Metamorphic Testing for Fairness Evaluation in Large Language Models: Identifying Intersectional Bias in LLaMA and GPT

TL;DR

This work addresses fairness evaluation in large language models by applying metamorphic testing with dedicated metamorphic relations to identify intersectional bias in LLaMA 3 and GPT-4. The authors generate source and follow-up test cases with varied sensitive attributes and analyze outputs for sentiment and tone to detect bias patterns, showing tone-based measures are more sensitive to fairness faults than sentiment. They report that certain metamorphic relations (MR4, MR2, MR17) consistently reveal biases and highlight interactions among attributes like religion, occupation, and economic status. The study provides a structured framework for fairness testing in LLMs, supporting robust deployment of LLMs in sensitive domains. It suggests extending MT-based fairness testing to additional architectures and broader fairness metrics.

Abstract

Large Language Models (LLMs) have made significant strides in Natural Language Processing but remain vulnerable to fairness-related issues, often reflecting biases inherent in their training data. These biases pose risks, particularly when LLMs are deployed in sensitive areas such as healthcare, finance, and law. This paper introduces a metamorphic testing approach to systematically identify fairness bugs in LLMs. We define and apply a set of fairness-oriented metamorphic relations (MRs) to assess the LLaMA and GPT model, a state-of-the-art LLM, across diverse demographic inputs. Our methodology includes generating source and follow-up test cases for each MR and analyzing model responses for fairness violations. The results demonstrate the effectiveness of MT in exposing bias patterns, especially in relation to tone and sentiment, and highlight specific intersections of sensitive attributes that frequently reveal fairness faults. This research improves fairness testing in LLMs, providing a structured approach to detect and mitigate biases and improve model robustness in fairness-sensitive applications.

Paper Structure

This paper contains 28 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 2: Fault Detection by MRs for Llama 3
  • Figure 3: Fault Detection by MRs for GPT4.0
  • Figure 4: Total Fairness Faults Detected for Each Sensitive Attribute Combination Across MRs in Intersectional Bias Analysis of the LLaMA 3 Model