A new kid on the block: Distributional semantics predicts the word-specific tone signatures of monosyllabic words in conversational Taiwan Mandarin
Xiaoyun Jin, Mirjam Ernestus, R. Harald Baayen
TL;DR
The paper addresses how word meaning influences the realization of pitch contours in spontaneous Taiwan Mandarin monosyllables, challenging the view that canonical tones alone govern tone realization. Using GAM to decompose F0 into components tied to tone pattern, word, and word sense, the study shows robust semantic effects that often surpass canonical tone contributions. Heterographic homophones exhibit distinct pitch signatures, and pitch contours can be predicted from contextualized embeddings, supporting a semantic basis for tone realization within the Discriminative Lexicon Model. Collectively, the findings argue for a semantic, distributional-semantic approach to Mandarin tone and demonstrate the utility of contextualized embeddings in phonetic prediction, with implications for theories of tone and lexicon-phonology interaction.
Abstract
We present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of words' meanings. We used the generalized additive model to decompose a given observed pitch contour into a set of component pitch contours that are tied to different control variables and semantic predictors. Even when variables such as word duration, gender, speaker identity, tonal context, vowel height, and utterance position are controlled for, the effect of word remains a strong predictor of tonal realization. We present evidence that this effect of word is a semantic effect: word sense is shown to be a better predictor than word, and heterographic homophones are shown to have different pitch contours. The strongest evidence for the importance of semantics is that the pitch contours of individual word tokens can be predicted from their contextualized embeddings with an accuracy that substantially exceeds a permutation baseline. For phonetics, distributional semantics is a new kid on the block. Although our findings challenge standard theories of Mandarin tone, they fit well within the theoretical framework of the Discriminative Lexicon Model.
