Modelled Multivariate Overlap: A method for measuring vowel merger
Irene Smith, Morgan Sonderegger, The Spade Consortium
TL;DR
Modelled Multivariate Overlap (MMO) addresses the need for a multivariate, uncertainty-aware measure of vowel overlap by jointly modeling $F1$ and $F2$ with Bayesian linear mixed-effects, then simulating joint distributions to compute overlap metrics such as Bhattacharyya affinity ($BA$). The method allows control for unbalanced data, provides uncertainty estimates from posterior draws, and is applicable to context-conditioned mergers like PIN-PEN across dialects. Applied to four English dialects, MMO yields model-based overlap estimates that better reflect theoretical expectations than traditional empirical distributions, with only subtle differences between minimal and expanded model structures. The framework is flexible, extendable to other acoustic dimensions and merger scenarios, and comes with publicly available code for replication and further research.
Abstract
This paper introduces a novel method for quantifying vowel overlap. There is a tension in previous work between using multivariate measures, such as those derived from empirical distributions, and the ability to control for unbalanced data and extraneous factors, as is possible when using fitted model parameters. The method presented here resolves this tension by jointly modelling all acoustic dimensions of interest and by simulating distributions from the model to compute a measure of vowel overlap. An additional benefit of this method is that computation of uncertainty becomes straightforward. We evaluate this method on corpus speech data targeting the PIN-PEN merger in four dialects of English and find that using modelled distributions to calculate Bhattacharyya affinity substantially improves results compared to empirical distributions, while the difference between multivariate and univariate modelling is subtle.
