Table of Contents
Fetching ...

A corpus-based investigation of pitch contours of monosyllabic words in conversational Taiwan Mandarin

Xiaoyun Jin, Mirjam Ernestus, R. Harald Baayen

Abstract

In Mandarin, the tonal contours of monosyllabic words produced in isolation or in careful speech are characterized by four lexical tones: a high-level tone (T1), a rising tone (T2), a dipping tone (T3) and a falling tone (T4). However, in spontaneous speech, the actual tonal realization of monosyllabic words can deviate significantly from these canonical tones due to intra-syllabic co-articulation and inter-syllabic co-articulation with adjacent tones. In addition, Chuang et al. (2024) recently reported that the tonal contours of disyllabic Mandarin words with T2-T4 tone pattern are co-determined by their meanings. Following up on their research, we present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of contextual predictors on the one hand, and the way in words' meanings co-determine pitch contours on the other hand. We analyze the F0 contours of 3824 tokens of 63 different word types in a spontaneous Taiwan Mandarin corpus, using the generalized additive (mixed) model to decompose a given observed pitch contour into a set of component pitch contours. We show that the tonal context substantially modify a word's canonical tone. Once the effect of tonal context is controlled for, T2 and T3 emerge as low flat tones, contrasting with T1 as a high tone, and with T4 as a high-to-mid falling tone. The neutral tone (T0), which in standard descriptions, is realized based on the preceding tone, emerges as a low tone in its own right, modified by the other predictors in the same way as the standard tones T1, T2, T3, and T4. We also show that word, and even more so, word sense, co-determine words' F0 contours. Analyses of variable importance using random forests further supported the substantial effect of tonal context and an effect of word sense.

A corpus-based investigation of pitch contours of monosyllabic words in conversational Taiwan Mandarin

Abstract

In Mandarin, the tonal contours of monosyllabic words produced in isolation or in careful speech are characterized by four lexical tones: a high-level tone (T1), a rising tone (T2), a dipping tone (T3) and a falling tone (T4). However, in spontaneous speech, the actual tonal realization of monosyllabic words can deviate significantly from these canonical tones due to intra-syllabic co-articulation and inter-syllabic co-articulation with adjacent tones. In addition, Chuang et al. (2024) recently reported that the tonal contours of disyllabic Mandarin words with T2-T4 tone pattern are co-determined by their meanings. Following up on their research, we present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of contextual predictors on the one hand, and the way in words' meanings co-determine pitch contours on the other hand. We analyze the F0 contours of 3824 tokens of 63 different word types in a spontaneous Taiwan Mandarin corpus, using the generalized additive (mixed) model to decompose a given observed pitch contour into a set of component pitch contours. We show that the tonal context substantially modify a word's canonical tone. Once the effect of tonal context is controlled for, T2 and T3 emerge as low flat tones, contrasting with T1 as a high tone, and with T4 as a high-to-mid falling tone. The neutral tone (T0), which in standard descriptions, is realized based on the preceding tone, emerges as a low tone in its own right, modified by the other predictors in the same way as the standard tones T1, T2, T3, and T4. We also show that word, and even more so, word sense, co-determine words' F0 contours. Analyses of variable importance using random forests further supported the substantial effect of tonal context and an effect of word sense.
Paper Structure (25 sections, 15 figures, 3 tables)

This paper contains 25 sections, 15 figures, 3 tables.

Figures (15)

  • Figure 1: The partial effect of log-transformed word duration by time for female (left) and male speakers (right), on the pitch contour for words with /u/. Darker colors indicate longer word durations.
  • Figure 2: Partial effect of time by tone sequence, for words, with /a/, preceded and followed by pauses (upper panels), and for /a/ words with selected combinations of preceding and following tones (lower panels). The first number of a panel legend denotes the tone pattern of the token itself, the numbers following the dot specify the preceding and following tones respectively. If there is no preceding or following tone, this is coded as 'NULL'.
  • Figure 3: Partial effect of time by tone sequence, for words with /a/ that carry a neutral tone and that are preceded by one of the five tones and that are followed by a pause (coded as 'NULL').
  • Figure 4: Left panel: the partial effect of utterance position; Right panel: the partial effect of semantic relevance. Note that the range of the log-transformed pitch on the y-axis differs across the vowels.
  • Figure 5: Improvement in model fit (using AIC), for each of the four vowels, when adding tone pattern (pink bars), consonant (blue bars), word (yellow bars) and word sense (red bars) as predictor to the baseline model.
  • ...and 10 more figures