Table of Contents
Fetching ...

On the application of Visibility Graphs in the Spectral Domain for Speaker Recognition

Hernan Bocaccio, Sergio Iglesias-Pérez, Miguel Romance, Regino Criado, Gabriel B. Mindlin

TL;DR

This paper addresses speaker recognition from spectral-domain signals by converting LPC-based spectral profiles into visibility graphs to capture topological patterns in speech spectra. It constructs four graph-based features—link density, average shortest path length, clustering coefficient, and modularity—from spectra of five vowels across seven speakers and uses a Random Forest ensemble to classify speaker identity, with SHAP showing modularity as a highly discriminative feature; macro-averaged precision, recall, and F1 on an independent test approach 0.95, while random-label baselines perform near chance. The spectral envelope is modeled by $H(f) = \\frac{d_{0}}{1-\\sum_{k=1}^{m} d_{k} e^{i k 2 \\pi f \\Delta}}$, computed with an LPC order of $m=13$ over 0–5512 Hz with 512 bins, and the method demonstrates robustness to LPC order and threshold choices. The results indicate that spectral-domain topology captures speaker-specific vocal-tract features with robustness to degradation, suggesting practical applicability for biometric systems and expanding the toolbox for speech processing.

Abstract

In this study, we explore the potential of visibility graphs in the spectral domain for speaker recognition. Adult participants were instructed to record vocalizations of the five Spanish vowels. For each vocalization, we computed the frequency spectrum considering the source-filter model of speech production, where formants are shaped by the vocal tract acting as a passive filter with resonant frequencies. Spectral profiles exhibited consistent intra-speaker characteristics, reflecting individual vocal tract anatomies, while showing variation between speakers. We then constructed visibility graphs from these spectral profiles and extracted various graph-theoretic metrics to capture their topological features. These metrics were assembled into feature vectors representing the five vowels for each speaker. Using an ensemble of decision trees trained on these features, we achieved high accuracy in speaker identification. Our analysis identified key topological features that were critical in distinguishing between speakers. This study demonstrates the effectiveness of visibility graphs for spectral analysis and their potential in speaker recognition. We also discuss the robustness of this approach, offering insights into its applicability for real-world speaker recognition systems. This research contributes to expanding the feature extraction toolbox for speaker recognition by leveraging the topological properties of speech signals in the spectral domain.

On the application of Visibility Graphs in the Spectral Domain for Speaker Recognition

TL;DR

This paper addresses speaker recognition from spectral-domain signals by converting LPC-based spectral profiles into visibility graphs to capture topological patterns in speech spectra. It constructs four graph-based features—link density, average shortest path length, clustering coefficient, and modularity—from spectra of five vowels across seven speakers and uses a Random Forest ensemble to classify speaker identity, with SHAP showing modularity as a highly discriminative feature; macro-averaged precision, recall, and F1 on an independent test approach 0.95, while random-label baselines perform near chance. The spectral envelope is modeled by , computed with an LPC order of over 0–5512 Hz with 512 bins, and the method demonstrates robustness to LPC order and threshold choices. The results indicate that spectral-domain topology captures speaker-specific vocal-tract features with robustness to degradation, suggesting practical applicability for biometric systems and expanding the toolbox for speech processing.

Abstract

In this study, we explore the potential of visibility graphs in the spectral domain for speaker recognition. Adult participants were instructed to record vocalizations of the five Spanish vowels. For each vocalization, we computed the frequency spectrum considering the source-filter model of speech production, where formants are shaped by the vocal tract acting as a passive filter with resonant frequencies. Spectral profiles exhibited consistent intra-speaker characteristics, reflecting individual vocal tract anatomies, while showing variation between speakers. We then constructed visibility graphs from these spectral profiles and extracted various graph-theoretic metrics to capture their topological features. These metrics were assembled into feature vectors representing the five vowels for each speaker. Using an ensemble of decision trees trained on these features, we achieved high accuracy in speaker identification. Our analysis identified key topological features that were critical in distinguishing between speakers. This study demonstrates the effectiveness of visibility graphs for spectral analysis and their potential in speaker recognition. We also discuss the robustness of this approach, offering insights into its applicability for real-world speaker recognition systems. This research contributes to expanding the feature extraction toolbox for speaker recognition by leveraging the topological properties of speech signals in the spectral domain.

Paper Structure

This paper contains 16 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Example of an audio segment data, with its corresponding amplitude across time, spectrogram, and the spectrum profile obtained from it shown in (a). From this spectrum profile we computed: (b) the connections between the elements of the temporal series according to \ref{['eq2']}; (c) the natural visibility graph associated in a force-based layout.
  • Figure 2: Spectrum functions magnitude of all subjects through the vocalization of the five different spanish vowels.
  • Figure 3: Graph-based metrics distributions obtained from the visibility graphs of all audio segments, grouped by subjects and vowels.
  • Figure 4: Models using visibility graphs-based metrics as features. (a) Performances of the trained models on the independent test sets, showing precision, recall, and F1-score for each class (each subject) in the left panel, and macro average performances in the right panel. (b) Feature importance analysis.
  • Figure 5: Evaluation of models robustness. Here we show the performance of models after replying the full procedure but using different LPC orders in the computation of spectral functions (a), and using different correlation thresholds in the process of representative spectral functions selection (b). We also show the correlation probability for a better understanding of how performance is affected due to the spectral function degradation, by extracting representative samples for different correlation thresholds.