Meta-Fair: AI-Assisted Fairness Testing of Large Language Models

Miguel Romero-Arjona; José A. Parejo; Juan C. Alonso; Ana B. Sánchez; Aitor Arrieta; Sergio Segura

Meta-Fair: AI-Assisted Fairness Testing of Large Language Models

Miguel Romero-Arjona, José A. Parejo, Juan C. Alonso, Ana B. Sánchez, Aitor Arrieta, Sergio Segura

TL;DR

Meta-Fair presents an automated fairness testing framework for large language models by fusing metamorphic testing with AI-assisted test-case generation and evaluation. It introduces a broad set of metamorphic relations and leverages LLMs as judges to achieve scalable bias detection, supported by three open-source tools for generation, execution, and evaluation. Empirical results across 12 LLMs and 14 MRs show high precision (average ~0.92) and bias in 29% of executions, with judge models reaching F1-scores up to 0.79, though non-determinism varies by MR. The work demonstrates that evaluating pairs of related prompts improves bias detection and lays a foundation for highly automated fairness testing, with practical implications for deploying safer, more trustworthy LLM systems.

Abstract

Fairness--the absence of unjustified bias--is a core principle in the development of Artificial Intelligence (AI) systems, yet it remains difficult to assess and enforce. Current approaches to fairness testing in large language models (LLMs) often rely on manual evaluation, fixed templates, deterministic heuristics, and curated datasets, making them resource-intensive and difficult to scale. This work aims to lay the groundwork for a novel, automated method for testing fairness in LLMs, reducing the dependence on domain-specific resources and broadening the applicability of current approaches. Our approach, Meta-Fair, is based on two key ideas. First, we adopt metamorphic testing to uncover bias by examining how model outputs vary in response to controlled modifications of input prompts, defined by metamorphic relations (MRs). Second, we propose exploiting the potential of LLMs for both test case generation and output evaluation, leveraging their capability to generate diverse inputs and classify outputs effectively. The proposal is complemented by three open-source tools supporting LLM-driven generation, execution, and evaluation of test cases. We report the findings of several experiments involving 12 pre-trained LLMs, 14 MRs, 5 bias dimensions, and 7.9K automatically generated test cases. The results show that Meta-Fair is effective in uncovering bias in LLMs, achieving an average precision of 92% and revealing biased behaviour in 29% of executions. Additionally, LLMs prove to be reliable and consistent evaluators, with the best-performing models achieving F1-scores of up to 0.79. Although non-determinism affects consistency, these effects can be mitigated through careful MR design. While challenges remain to ensure broader applicability, the results indicate a promising path towards an unprecedented level of automation in LLM testing.

Meta-Fair: AI-Assisted Fairness Testing of Large Language Models

TL;DR

Abstract

Meta-Fair: AI-Assisted Fairness Testing of Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)