Investigating Confidence Estimation Measures for Speaker Diarization

Anurag Chowdhury; Abhinav Misra; Mark C. Fuhs; Monika Woszczyna

Investigating Confidence Estimation Measures for Speaker Diarization

Anurag Chowdhury, Abhinav Misra, Mark C. Fuhs, Monika Woszczyna

TL;DR

This work addresses the problem of diarization errors propagating to downstream tasks by developing segment-level confidence measures that can work with both white-box and black-box systems. It evaluates multiple embedding-based scoring methods, including a spectral clustering variant and silhouette-based approaches, across AMI and DoPaCo datasets with ECAPA-TDNN/xVector and E2E diarization pipelines. The findings show that silhouette-based confidence (and related embedding-based methods) consistently reduces the covered diarization error rate (cDER), isolating a significant fraction of errors within the lowest-confidence segments. The results demonstrate practical value for downstream data selection and potential overlap-aware improvements, enabling more reliable speaker labeling in challenging multi-speaker conversations.

Abstract

Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise, and overlapping speech. These errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity, such as speaker-adapted speech recognition. One of the ways to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems. In this work, we investigate multiple methods for generating diarization confidence scores, including those derived from the original diarization system and those derived from an external model. Our experiments across multiple datasets and diarization systems demonstrate that the most competitive confidence score methods can isolate ~30% of the diarization errors within segments with the lowest ~10% of confidence scores.

Investigating Confidence Estimation Measures for Speaker Diarization

TL;DR

Abstract

Investigating Confidence Estimation Measures for Speaker Diarization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)