Table of Contents
Fetching ...

Sum of Group Error Differences: A Critical Examination of Bias Evaluation in Biometric Verification and a Dual-Metric Measure

Alaa Elobaid, Nathan Ramoly, Lara Younes, Symeon Papadopoulos, Eirini Ntoutsi, Ioannis Kompatsiaris

TL;DR

The paper scrutinizes bias evaluation in biometric verification (BV) and reveals that existing metrics often miss intermediate or magnitude aspects of demographic bias. It analyzes limitations of differential performance and outcome metrics, introduces Sum of Group Error Differences (SEDG) as a general, application-independent bias measure, and proposes a dual-metric framework (Average of $\mathrm{SED}_G$ and $\sigma_{\mathrm{SED}_G}$). Using synthetic BV data with single and multiple disadvantaged groups, it shows that traditional metrics can misrank or fail to differentiate bias levels, while SEDG captures both the type and magnitude of bias across scenarios. The work provides scenario-based recommendations and releases public code, offering a practical tool for robust BV bias evaluation with potential impact on fairer biometric systems.

Abstract

Biometric Verification (BV) systems often exhibit accuracy disparities across different demographic groups, leading to biases in BV applications. Assessing and quantifying these biases is essential for ensuring the fairness of BV systems. However, existing bias evaluation metrics in BV have limitations, such as focusing exclusively on match or non-match error rates, overlooking bias on demographic groups with performance levels falling between the best and worst performance levels, and neglecting the magnitude of the bias present. This paper presents an in-depth analysis of the limitations of current bias evaluation metrics in BV and, through experimental analysis, demonstrates their contextual suitability, merits, and limitations. Additionally, it introduces a novel general-purpose bias evaluation measure for BV, the ``Sum of Group Error Differences (SEDG)''. Our experimental results on controlled synthetic datasets demonstrate the effectiveness of demographic bias quantification when using existing metrics and our own proposed measure. We discuss the applicability of the bias evaluation metrics in a set of simulated demographic bias scenarios and provide scenario-based metric recommendations. Our code is publicly available under \url{https://github.com/alaaobeid/SEDG}.

Sum of Group Error Differences: A Critical Examination of Bias Evaluation in Biometric Verification and a Dual-Metric Measure

TL;DR

The paper scrutinizes bias evaluation in biometric verification (BV) and reveals that existing metrics often miss intermediate or magnitude aspects of demographic bias. It analyzes limitations of differential performance and outcome metrics, introduces Sum of Group Error Differences (SEDG) as a general, application-independent bias measure, and proposes a dual-metric framework (Average of and ). Using synthetic BV data with single and multiple disadvantaged groups, it shows that traditional metrics can misrank or fail to differentiate bias levels, while SEDG captures both the type and magnitude of bias across scenarios. The work provides scenario-based recommendations and releases public code, offering a practical tool for robust BV bias evaluation with potential impact on fairer biometric systems.

Abstract

Biometric Verification (BV) systems often exhibit accuracy disparities across different demographic groups, leading to biases in BV applications. Assessing and quantifying these biases is essential for ensuring the fairness of BV systems. However, existing bias evaluation metrics in BV have limitations, such as focusing exclusively on match or non-match error rates, overlooking bias on demographic groups with performance levels falling between the best and worst performance levels, and neglecting the magnitude of the bias present. This paper presents an in-depth analysis of the limitations of current bias evaluation metrics in BV and, through experimental analysis, demonstrates their contextual suitability, merits, and limitations. Additionally, it introduces a novel general-purpose bias evaluation measure for BV, the ``Sum of Group Error Differences (SEDG)''. Our experimental results on controlled synthetic datasets demonstrate the effectiveness of demographic bias quantification when using existing metrics and our own proposed measure. We discuss the applicability of the bias evaluation metrics in a set of simulated demographic bias scenarios and provide scenario-based metric recommendations. Our code is publicly available under \url{https://github.com/alaaobeid/SEDG}.
Paper Structure (22 sections, 30 equations, 8 tables, 1 algorithm)