Table of Contents
Fetching ...

Advanced Acceptance Score: A Holistic Measure for Biometric Quantification

Aman Verma, Seshan Srirangarajan, Sumantra Dutta Roy

TL;DR

An exhaustive set of evaluation measures for biometric capacity estimation based on ranking order and relevance of output scores as the primary basis for evaluation, and formulate advanced acceptance score as a holistic evaluation measure.

Abstract

Quantifying biometric characteristics within hand gestures involve derivation of fitness scores from a gesture and identity aware feature space. However, evaluating the quality of these scores remains an open question. Existing biometric capacity estimation literature relies upon error rates. But these rates do not indicate goodness of scores. Thus, in this manuscript we present an exhaustive set of evaluation measures. We firstly identify ranking order and relevance of output scores as the primary basis for evaluation. In particular, we consider both rank deviation as well as rewards for: (i) higher scores of high ranked gestures and (ii) lower scores of low ranked gestures. We also compensate for correspondence between trends of output and ground truth scores. Finally, we account for disentanglement between identity features of gestures as a discounting factor. Integrating these elements with adequate weighting, we formulate advanced acceptance score as a holistic evaluation measure. To assess effectivity of the proposed we perform in-depth experimentation over three datasets with five state-of-the-art (SOTA) models. Results show that the optimal score selected with our measure is more appropriate than existing other measures. Also, our proposed measure depicts correlation with existing measures. This further validates its reliability. We have made our \href{https://github.com/AmanVerma2307/MeasureSuite}{code} public.

Advanced Acceptance Score: A Holistic Measure for Biometric Quantification

TL;DR

An exhaustive set of evaluation measures for biometric capacity estimation based on ranking order and relevance of output scores as the primary basis for evaluation, and formulate advanced acceptance score as a holistic evaluation measure.

Abstract

Quantifying biometric characteristics within hand gestures involve derivation of fitness scores from a gesture and identity aware feature space. However, evaluating the quality of these scores remains an open question. Existing biometric capacity estimation literature relies upon error rates. But these rates do not indicate goodness of scores. Thus, in this manuscript we present an exhaustive set of evaluation measures. We firstly identify ranking order and relevance of output scores as the primary basis for evaluation. In particular, we consider both rank deviation as well as rewards for: (i) higher scores of high ranked gestures and (ii) lower scores of low ranked gestures. We also compensate for correspondence between trends of output and ground truth scores. Finally, we account for disentanglement between identity features of gestures as a discounting factor. Integrating these elements with adequate weighting, we formulate advanced acceptance score as a holistic evaluation measure. To assess effectivity of the proposed we perform in-depth experimentation over three datasets with five state-of-the-art (SOTA) models. Results show that the optimal score selected with our measure is more appropriate than existing other measures. Also, our proposed measure depicts correlation with existing measures. This further validates its reliability. We have made our \href{https://github.com/AmanVerma2307/MeasureSuite}{code} public.
Paper Structure (25 sections, 19 equations, 8 figures, 2 tables)

This paper contains 25 sections, 19 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Comparison of the proposed $nA_{r}^{*}(\Delta)$with existing ranking and retrieval measures. We compare the optimal score suggested by the measures in terms of four design criteria: (i) deviation in ranking order of gestures, (ii) entanglement between biometric traits of different gestures, (iii) trend deviation, and (iv) quality or relevance of quantification estimates. For optimal score, the first three are expected to be minimal, while relevance is to be maximized. We have normalized the values in the range between $[0,1]$. This analysis reveals that score values selected by $nA_{r}^{*}(\Delta)$ jointly satisfies all the design criteria. This is not observed for other measures. This merit can be accredited to multi-facet and task specific evaluation.
  • Figure 2: Computation of trend match distance ($\Psi$): With $\Psi$ we find out if DGBQA scores of two consecutively ranked gestures have the same biometric separation as the ground truth. Firstly, all the gestures are arranged as per the ranking order in ground truth. The last gesture (here $G_3$) carries the highest biometric characteristics. To compute $\Psi$ we traverse in two directions: (i) forward pass: along increasing order (refer (a)) and (ii) backward pass: along decreasing order (refer (b)). We utilize the difference between consecutive DGBQA scores to estimate the ground truth. The summation of difference between estimated and ground truth score from both the passes gives $\Psi$ for that gesture. This is summed across all the gestures to derive $\Psi$.
  • Figure 3: Radar charts for comparing score selection of advanced acceptance score and it's variants. Comparison in performed in terms of our design criterion: (i) rank deviation, (ii) relevance, (iii) trend deviation, and (iv) entanglement. To clearly discriminate between the relevance scores, we raised them as an exponent: $2^{\lambda\mathcal{R}}$. In comparison to it's variants, the advanced acceptance score satisfies all the criteria. This high end capabilities can be accredited to multi-faceted evaluation in the proposed. Exact numerical values can be inferred from Table \ref{['tab:compVariants']}, while we illustrate a qualitative comparison in Fig. \ref{['fig:MS_QA']}.
  • Figure 4: Qualitative comparison between advanced acceptance score and its variants over score/model selection (on Soli dataset). We conduct this analysis using: (i) histogram comparing selected DGBQA scores and ground truth (see the first row) and (ii) Gram matrix (to quantify entanglement). The score/model selected from $A_r^*(\Delta)$ (column (f)) has DGBQA scores comparable to the ground truth. While it attains the least entanglement (we require lighter values in off diagonal elements of the off diagonal blocks). This is also observed wrt verma2024quantifying.
  • Figure 5: Bars and whiskers plot to compare distribution of evaluation measures. The plot highlights that individual measures covers different range. This suggests that they convey different information. $A_r^{*}(\Delta)$ fuses them to generate a broader range of values. This is pivotal in differentiating between good and low quality scores.
  • ...and 3 more figures