To Impute or Not: Recommendations for Multibiometric Fusion
Melissa R Dale, Elliot Singer, Bengt J. Borgström, Arun Ross
TL;DR
This work tackles the challenge of missing match scores in multibiometric score-level fusion by systematically evaluating both univariate and multivariate imputation strategies across three diverse multimodal datasets. It employs simulated missingness up to 90% and analyzes the impact of training data balance and inter-modality correlations on imputation effectiveness, with MICE-based multivariate methods—especially Bayesian Ridge regression—showing robust gains for missing data. The findings emphasize that imputation outperforms simply discarding incomplete vectors, that balancing the training set mitigates biases toward overrepresented classes, and that the choice between multivariate and univariate imputation should reflect inter-modality score correlations. The results offer practical guidance for deploying imputation in real-world biometric fusion systems and point to avenues for future work in label-free, hybrid, and context-aware imputation strategies.
Abstract
Combining match scores from different biometric systems via fusion is a well-established approach to improving recognition accuracy. However, missing scores can degrade performance as well as limit the possible fusion techniques that can be applied. Imputation is a promising technique in multibiometric systems for replacing missing data. In this paper, we evaluate various score imputation approaches on three multimodal biometric score datasets, viz. NIST BSSR1, BIOCOP2008, and MIT LL Trimodal, and investigate the factors which might influence the effectiveness of imputation. Our studies reveal three key observations: (1) Imputation is preferable over not imputing missing scores, even when the fusion rule does not require complete score data. (2) Balancing the classes in the training data is crucial to mitigate negative biases in the imputation technique towards the under-represented class, even if it involves dropping a substantial number of score vectors. (3) Multivariate imputation approaches seem to be beneficial when scores between modalities are correlated, while univariate approaches seem to benefit scenarios where scores between modalities are less correlated.
