On the Reliability of Biometric Datasets: How Much Test Data Ensures Reliability?

Matin Fallahi; Ragini Ramesh; Pankaja Priya Ramasamy; Patricia Arias Cabarcos; Thorsten Strufe; Philipp Terhörst

On the Reliability of Biometric Datasets: How Much Test Data Ensures Reliability?

Matin Fallahi, Ragini Ramesh, Pankaja Priya Ramasamy, Patricia Arias Cabarcos, Thorsten Strufe, Philipp Terhörst

TL;DR

The paper addresses the lack of reliability reporting in biometric performance by introducing BioQuake, a binomial-based uncertainty metric that quantifies how far the true error rate may deviate from the observed rate at a given confidence and sample size. It formalizes the method, derives practical uncertainty rules, and provides an online tool, validating BioQuake on datasets spanning eight modalities and 62 benchmarks. The findings show that many reported state-of-the-art results could substantially misrepresent true performance, underscoring the need for standardized reliability reporting. The work offers a concrete framework and tools to promote more reliable biometric evaluations and fairer cross-study comparisons.

Abstract

Biometric authentication is increasingly popular for its convenience and accuracy. However, while recent advancements focus on reducing errors and expanding modalities, the reliability of reported performance metrics often remains overlooked. Understanding reliability is critical, as it communicates how accurately reported error rates represent a system's actual performance, considering the uncertainty in error-rate estimates from test data. Currently, there is no widely accepted standard for reporting these uncertainties and indeed biometric studies rarely provide reliability estimates, limiting comparability and interpretation. To address this gap, we introduce BioQuake--a measure to estimate uncertainty in biometric verification systems--and empirically validate it on four systems and three datasets. Based on BioQuake, we provide simple guidelines for estimating performance uncertainty and facilitating reliable reporting. Additionally, we apply BioQuake to analyze biometric recognition performance on 62 biometric datasets used in research across eight modalities: face, fingerprint, gait, iris, keystroke, eye movement, Electroencephalogram (EEG), and Electrocardiogram (ECG). Our analysis shows that reported state-of-the-art performance often deviates significantly from actual error rates, potentially leading to inaccurate conclusions. To support researchers and foster the development of more reliable biometric systems and datasets, we release BioQuake as an easy-to-use web tool for reliability calculations.

On the Reliability of Biometric Datasets: How Much Test Data Ensures Reliability?

TL;DR

Abstract

Paper Structure (18 sections, 8 equations, 4 figures, 2 tables)

This paper contains 18 sections, 8 equations, 4 figures, 2 tables.

Introduction
Related Work
Confidence Estimation For Biometric Recognition
Limitations of Previous Works
Methodology
Introducing BioQuake $(\delta)$
Determining BioQuake Visually
BioQuake Rules
Certainty Classes
Upper and Lower Bounds of Uncertainty
Experimental setup
Setup A: Empirical Correctness Analysis of BioQuake
Setup B: Uncertainty Analysis of Existing Datasets
Results
Empirical Correctness Analysis of BioQuake
...and 3 more sections

Figures (4)

Figure 1: Visualization of Bioquake Uncertainty at Different Confidence Levels - The relationship between the observed error rate (FMR/FNMR) and the required number of comparisons to achieve a specific BioQuake uncertainty is displayed for the confidence levels of 90% ($\alpha=10\%$), 95% ($\alpha=5\%$), and 99% ($\alpha=1\%$).
Figure 2: Visualization of BioQuake Principles - The relationship between the observed error rate (FMR/FNMR) and the required number of comparisons to achieve a specific BioQuake uncertainty $\delta$ is displayed (for $\alpha=0.05$). Based on this, the required number of comparisons for a specific uncertainty, as well as the significance of measurement, could be determined.
Figure 3: Comparative Analysis between the theoretical BioQuake and empircal Uncertainty for FMRs - For different fractions (frac) of the base datasets, the proposed theoretical BioQuake approach (dashed line) is compared against the empirical uncertainty (solid line) of FMRs on different model-dataset combinations. A high similarity between both approach is seen indicated the strong effectiveness of BioQuake in estimating the uncertainty.
Figure 4: Comparative Analysis between the theoretical BioQuake and empircal Uncertainty for FNMRs - For different fractions (frac) of the base datasets, the proposed theoretical BioQuake approach (dashed line) is compared against the empirical uncertainty (solid line) of FNMRs on different model-dataset combinations. A high similarity between both approach is seen indicated the strong effectiveness of BioQuake in estimating the uncertainty.

On the Reliability of Biometric Datasets: How Much Test Data Ensures Reliability?

TL;DR

Abstract

On the Reliability of Biometric Datasets: How Much Test Data Ensures Reliability?

Authors

TL;DR

Abstract

Table of Contents

Figures (4)