On the Reliability of Biometric Datasets: How Much Test Data Ensures Reliability?
Matin Fallahi, Ragini Ramesh, Pankaja Priya Ramasamy, Patricia Arias Cabarcos, Thorsten Strufe, Philipp Terhörst
TL;DR
The paper addresses the lack of reliability reporting in biometric performance by introducing BioQuake, a binomial-based uncertainty metric that quantifies how far the true error rate may deviate from the observed rate at a given confidence and sample size. It formalizes the method, derives practical uncertainty rules, and provides an online tool, validating BioQuake on datasets spanning eight modalities and 62 benchmarks. The findings show that many reported state-of-the-art results could substantially misrepresent true performance, underscoring the need for standardized reliability reporting. The work offers a concrete framework and tools to promote more reliable biometric evaluations and fairer cross-study comparisons.
Abstract
Biometric authentication is increasingly popular for its convenience and accuracy. However, while recent advancements focus on reducing errors and expanding modalities, the reliability of reported performance metrics often remains overlooked. Understanding reliability is critical, as it communicates how accurately reported error rates represent a system's actual performance, considering the uncertainty in error-rate estimates from test data. Currently, there is no widely accepted standard for reporting these uncertainties and indeed biometric studies rarely provide reliability estimates, limiting comparability and interpretation. To address this gap, we introduce BioQuake--a measure to estimate uncertainty in biometric verification systems--and empirically validate it on four systems and three datasets. Based on BioQuake, we provide simple guidelines for estimating performance uncertainty and facilitating reliable reporting. Additionally, we apply BioQuake to analyze biometric recognition performance on 62 biometric datasets used in research across eight modalities: face, fingerprint, gait, iris, keystroke, eye movement, Electroencephalogram (EEG), and Electrocardiogram (ECG). Our analysis shows that reported state-of-the-art performance often deviates significantly from actual error rates, potentially leading to inaccurate conclusions. To support researchers and foster the development of more reliable biometric systems and datasets, we release BioQuake as an easy-to-use web tool for reliability calculations.
