Table of Contents
Fetching ...

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation

Matthias Bartolo

TL;DR

This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset, and evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance.

Abstract

In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation

TL;DR

This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset, and evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance.

Abstract

In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
Paper Structure (6 sections, 7 figures, 2 tables)

This paper contains 6 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 2: Curves demonstrating the best model's training and validation loss, as well as accuracy.
  • Figure 3: Confusion matrix showcasing the top 20 projected speakers for the best model.
  • Figure 4: Gender accuracy and bias evaluation for the best model's performance on the test set.
  • Figure 5: Accent accuracy and bias evaluation for the best model's performance on the test set.
  • Figure : Mel Spectrogram Feature Extraction
  • ...and 2 more figures