TCG CREST System Description for the Second DISPLACE Challenge

Nikhil Raghav; Subhajit Saha; Md Sahidullah; Swagatam Das

TCG CREST System Description for the Second DISPLACE Challenge

Nikhil Raghav, Subhajit Saha, Md Sahidullah, Swagatam Das

TL;DR

This report describes the speaker diarization (SD) and language diarization (LD) systems developed by the team for the Second DISPLACE Challenge, 2024 and uses spectral clustering for both the speaker and language diarization.

Abstract

In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024. Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios. We investigated different speech enhancement techniques, voice activity detection (VAD) techniques, unsupervised domain categorization, and neural embedding extraction architectures. We also exploited the fusion of various embedding extraction models. We implemented our system with the open-source SpeechBrain toolkit. Our final submissions use spectral clustering for both the speaker and language diarization. We achieve about $7\%$ relative improvement over the challenge baseline in Track 1. We did not obtain improvement over the challenge baseline in Track 2.

TCG CREST System Description for the Second DISPLACE Challenge

TL;DR

Abstract

TCG CREST System Description for the Second DISPLACE Challenge

Authors

TL;DR

Abstract

Table of Contents