Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis
Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi
TL;DR
The paper revisits SASV score fusion through decision-theoretic and compositional data analysis lenses. It shows that score calibration before fusion is beneficial, advocates fusing LLRs rather than raw scores, and demonstrates that a nonlinear LLR fusion rule yields superior SASV discrimination compared to linear methods, with strong results on the SASV challenge data. It also links Gaussian back-end fusion to the optimal decision formulation under certain priors and costs, offering practical guidance for designing robust spoofing-aware speaker verification systems. Overall, the work provides a principled framework for SASV fusion that improves robustness to zero-effort imposters and spoofing attacks while offering actionable calibration and fusion strategies.
Abstract
Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by summing the ASV and CM scores can be interpreted on the basis of compositional data analysis, and score calibration before fusion is essential. Second, the interpretation leads to an improved fusion method that linearly combines the log-likelihood ratios of ASV and CM. However, as the third finding reveals, this linear combination is inferior to a non-linear one in making optimal decisions. The outcomes of these findings, namely, the score calibration before fusion, improved linear fusion, and better non-linear fusion, were found to be effective on the SASV challenge database.
