A Gray Literature Study on Fairness Requirements in AI-enabled Software Engineering
Thanh Nguyen, Chaima Boufaied, Ronnie de Souza Santos
TL;DR
This gray literature review investigates how fairness requirements in AI/ML are defined, managed, and violated across the software development lifecycle. Using a systematic extraction and coding of 65 gray-literature sources, the study reveals context-dependent fairness definitions, a predominantly reactive emphasis on model training and monitoring, and key causes (data biases, model design flaws, human factors) that lead to harms such as biased predictions and loss of trust. A co-occurrence analysis highlights that data and model biases strongly drive harmful outcomes, reinforcing the need for fairness-by-design and proactive governance across SDLC stages. The findings advocate for explicit, context-aware fairness requirements and cross-disciplinary practices to mitigate social and ethical risks in AI-enabled software engineering. The work also provides a replication package and suggests a framework for systematic fairness requirements throughout the lifecycle.
Abstract
Today, with the growing obsession with applying Artificial Intelligence (AI), particularly Machine Learning (ML), to software across various contexts, much of the focus has been on the effectiveness of AI models, often measured through common metrics such as F1- score, while fairness receives relatively little attention. This paper presents a review of existing gray literature, examining fairness requirements in AI context, with a focus on how they are defined across various application domains, managed throughout the Software Development Life Cycle (SDLC), and the causes, as well as the corresponding consequences of their violation by AI models. Our gray literature investigation shows various definitions of fairness requirements in AI systems, commonly emphasizing non-discrimination and equal treatment across different demographic and social attributes. Fairness requirement management practices vary across the SDLC, particularly in model training and bias mitigation, fairness monitoring and evaluation, and data handling practices. Fairness requirement violations are frequently linked, but not limited, to data representation bias, algorithmic and model design bias, human judgment, and evaluation and transparency gaps. The corresponding consequences include harm in a broad sense, encompassing specific professional and societal impacts as key examples, stereotype reinforcement, data and privacy risks, and loss of trust and legitimacy in AI-supported decisions. These findings emphasize the need for consistent frameworks and practices to integrate fairness into AI software, paying as much attention to fairness as to effectiveness.
