Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Luyao Cheng; Siqi Zheng; Qinglin Zhang; Hui Wang; Yafeng Chen; Qian Chen; Shiliang Zhang

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

TL;DR

The paper tackles the problem of diarization under challenging acoustic conditions by exploiting semantic information from transcripts. It introduces Joint Pairwise Constraints Propagation (JPCP), which injects speaker-related semantic cues into clustering through must-link and cannot-link constraints, embedded into both embedding normalization and affinity refinement via constraint propagation. The approach combines SSDR-based constrained embedding normalization and a refined affinity function with enhanced constraint propagation (E^2CPM) to propagate sparse semantic constraints. Experiments on AISHELL-4 show that semantic constraints yield consistent improvements over acoustic-only baselines, with notable reductions in Text Diarization Error Rate (TextDER) and improved speaker count accuracy, and simulated constraints indicate upper-bound potential. The framework is modular and compatible with existing SD pipelines, suggesting practical impact for robust diarization in real-world meetings as language models and ASR improve.

Abstract

Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit these semantic cues utilizing language models. In this work we propose a novel approach to effectively leverage semantic information in clustering-based speaker diarization systems. Firstly, we introduce spoken language understanding modules to extract speaker-related semantic information and utilize these information to construct pairwise constraints. Secondly, we present a novel framework to integrate these constraints into the speaker diarization pipeline, enhancing the performance of the entire system. Extensive experiments conducted on the public dataset demonstrate the consistent superiority of our proposed approach over acoustic-only speaker diarization systems.

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 3 figures, 1 table)

This paper contains 15 sections, 6 equations, 3 figures, 1 table.

Introduction
Semantic speaker constraints
Semantic Speaker-related Tasks
Pairwise Constraints from Semantic Information
Constrained Speaker Diarization
Constrained Embedding Normalization
Constrained Affinity Function
Improve Constraints Propagation Algorithm
Experimental setup
Dataset and Metrics
Acoustic and Semantic Modules Configuration
Results and Disscussions
Experiments Results
Constraints Analysis
Conclusion

Figures (3)

Figure 1: A sample of strategy for constructing constraints.
Figure 2: The pipeline is a traditional speaker diarization backend with acoustic information. The addtional pairwise constraints constructed from semantic information, including Must-Link and Cannot-Link, will be used in two parts: Embedding Normalization and Affinity Function.
Figure 3: The impact of pairwise constraints rate on both clus- tering metrics and the effectiveness of the overall speaker diarization system.

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

TL;DR

Abstract

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)