Table of Contents
Fetching ...

Articulatory strategy in vowel production as a basis for speaker discrimination

Justin J. H. Lo, Patrycja Strycharczuk, Sam Kirkham

TL;DR

The study investigates whether articulatory strategy in vowel production is sufficiently speaker-specific for discrimination by analyzing midsagittal tongue shapes from 40 English speakers using Generalised Procrustes Analysis and tangent-space PCA. It contrasts size-and-shape versus shape-only features and evaluates their speaker-discriminatory power through likelihood-ratio testing, reporting metrics such as $EER$ and $C_{llr}$. The results show tongue size (size-and-shape PC1) as the strongest discriminator, with anterior tongue dorsum curvature (shape PC3) also exhibiting notable individuality; combinations of shape features can approach the performance of size-and-shape, though inter--PC co-variation among speakers influences results. These findings support a holistic view of speaker discrimination that integrates anatomical and articulatory strategies and point to future work linking articulatory variation to acoustic correlates for a fuller phonetic model of speaker identity.

Abstract

The way speakers articulate is well known to be variable across individuals while at the same time subject to anatomical and biomechanical constraints. In this study, we ask whether articulatory strategy in vowel production can be sufficiently speaker-specific to form the basis for speaker discrimination. We conducted Generalised Procrustes Analyses of tongue shape data from 40 English speakers from the North West of England, and assessed the speaker-discriminatory potential of orthogonal tongue shape features within the framework of likelihood ratios. Tongue size emerged as the individual dimension with the strongest discriminatory power, while tongue shape variation in the more anterior part of the tongue generally outperformed tongue shape variation in the posterior part. When considered in combination, shape-only information may offer comparable levels of speaker specificity to size-and-shape information, but only when features do not exhibit speaker-level co-variation.

Articulatory strategy in vowel production as a basis for speaker discrimination

TL;DR

The study investigates whether articulatory strategy in vowel production is sufficiently speaker-specific for discrimination by analyzing midsagittal tongue shapes from 40 English speakers using Generalised Procrustes Analysis and tangent-space PCA. It contrasts size-and-shape versus shape-only features and evaluates their speaker-discriminatory power through likelihood-ratio testing, reporting metrics such as and . The results show tongue size (size-and-shape PC1) as the strongest discriminator, with anterior tongue dorsum curvature (shape PC3) also exhibiting notable individuality; combinations of shape features can approach the performance of size-and-shape, though inter--PC co-variation among speakers influences results. These findings support a holistic view of speaker discrimination that integrates anatomical and articulatory strategies and point to future work linking articulatory variation to acoustic correlates for a fuller phonetic model of speaker identity.

Abstract

The way speakers articulate is well known to be variable across individuals while at the same time subject to anatomical and biomechanical constraints. In this study, we ask whether articulatory strategy in vowel production can be sufficiently speaker-specific to form the basis for speaker discrimination. We conducted Generalised Procrustes Analyses of tongue shape data from 40 English speakers from the North West of England, and assessed the speaker-discriminatory potential of orthogonal tongue shape features within the framework of likelihood ratios. Tongue size emerged as the individual dimension with the strongest discriminatory power, while tongue shape variation in the more anterior part of the tongue generally outperformed tongue shape variation in the posterior part. When considered in combination, shape-only information may offer comparable levels of speaker specificity to size-and-shape information, but only when features do not exhibit speaker-level co-variation.

Paper Structure

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Effects of PC1--3 (left to right) in the size-and-shape PCA. Each panel shows the mean shape (heavy grey) and variation up to 3 SDs higher (solid red) and lower (dashed blue).
  • Figure 2: Effects of PC1--3 (left to right) in the shape PCA. Each panel shows the mean shape (heavy grey) and variation up to 3 SDs higher (solid red) and lower (dashed blue).
  • Figure 3: Performance of systems with size-and-shape (blue circles) and shape PCs (red triangles) as input.
  • Figure 4: Mean (heavy) and individual (light) tongue shapes of two speakers with the highest (solid red) and lowest (dashed blue) mean PC2 scores in the size-and-shape PCA, together with overall mean shape (solid black).