Table of Contents
Fetching ...

Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of Tone 3 sandhi

Yuxin Lu, Yu-Ying Chuang, R. Harald Baayen

TL;DR

The paper investigates whether Tone 3 sandhi in spontaneous Taiwan Mandarin results in complete neutralization when disyllabic words with patterns T2-T3 and T3-T3 are realized in conversation. Using a generalized additive mixed model (GAMM) on full f0 trajectories and a rich set of predictors, including word sense derived via BERT and WordNet, the authors demonstrate that Tone 3 sandhi is complete in this speech style when word sense is accounted for, and that word meaning strongly shapes f0 contours alongside tonal context. They show that word and speaker are the most influential predictors, with word sense offering a robust, semantically grounded account of tonal variation, sometimes outperforming traditional frequency-based explanations. The findings challenge the notion of incomplete neutralization in Taiwan Mandarin spontaneous speech and highlight the importance of meaning-bearing, exemplar-like representations in tonal realization, with implications for cross-dialect phonetics and usage-based theories of speech production. The analysis relies on a scaled-$t$ error GAMM to model $f0$ trajectories, enabling nuanced interpretation of dynamic pitch across time and context: $(y-b7)/c \,\sim\, t_ $.

Abstract

In Standard Chinese, Tone 3 (the dipping tone) becomes Tone 2 (rising tone) when followed by another Tone 3. Previous studies have noted that this sandhi process may be incomplete, in the sense that the assimilated Tone 3 is still distinct from a true Tone 2. While Mandarin Tone 3 sandhi is widely studied using carefully controlled laboratory speech (Xu, 1997) and more formal registers of Beijing Mandarin (Yuan and Chen, 2014), less is known about its realization in spontaneous speech, and about the effect of contextual factors on tonal realization. The present study investigates the pitch contours of two-character words with T2-T3 and T3-T3 tone patterns in spontaneous Taiwan Mandarin conversations. Our analysis makes use of the Generative Additive Mixed Model (GAMM, Wood, 2017) to examine fundamental frequency (f0) contours as a function of normalized time. We consider various factors known to influence pitch contours, including gender, speaking rate, speaker, neighboring tones, word position, bigram probability, and also novel predictors, word and word sense (Chuang et al., 2024). Our analyses revealed that in spontaneous Taiwan Mandarin, T3-T3 words become indistinguishable from T2-T3 words, indicating complete sandhi, once the strong effect of word (or word sense) is taken into account. For our data, the shape of f0 contours is not co-determined by word frequency. In contrast, the effect of word meaning on f0 contours is robust, as strong as the effect of adjacent tones, and is present for both T2-T3 and T3-T3 words.

Form and meaning co-determine the realization of tone in Taiwan Mandarin spontaneous speech: the case of Tone 3 sandhi

TL;DR

The paper investigates whether Tone 3 sandhi in spontaneous Taiwan Mandarin results in complete neutralization when disyllabic words with patterns T2-T3 and T3-T3 are realized in conversation. Using a generalized additive mixed model (GAMM) on full f0 trajectories and a rich set of predictors, including word sense derived via BERT and WordNet, the authors demonstrate that Tone 3 sandhi is complete in this speech style when word sense is accounted for, and that word meaning strongly shapes f0 contours alongside tonal context. They show that word and speaker are the most influential predictors, with word sense offering a robust, semantically grounded account of tonal variation, sometimes outperforming traditional frequency-based explanations. The findings challenge the notion of incomplete neutralization in Taiwan Mandarin spontaneous speech and highlight the importance of meaning-bearing, exemplar-like representations in tonal realization, with implications for cross-dialect phonetics and usage-based theories of speech production. The analysis relies on a scaled- error GAMM to model trajectories, enabling nuanced interpretation of dynamic pitch across time and context: .

Abstract

In Standard Chinese, Tone 3 (the dipping tone) becomes Tone 2 (rising tone) when followed by another Tone 3. Previous studies have noted that this sandhi process may be incomplete, in the sense that the assimilated Tone 3 is still distinct from a true Tone 2. While Mandarin Tone 3 sandhi is widely studied using carefully controlled laboratory speech (Xu, 1997) and more formal registers of Beijing Mandarin (Yuan and Chen, 2014), less is known about its realization in spontaneous speech, and about the effect of contextual factors on tonal realization. The present study investigates the pitch contours of two-character words with T2-T3 and T3-T3 tone patterns in spontaneous Taiwan Mandarin conversations. Our analysis makes use of the Generative Additive Mixed Model (GAMM, Wood, 2017) to examine fundamental frequency (f0) contours as a function of normalized time. We consider various factors known to influence pitch contours, including gender, speaking rate, speaker, neighboring tones, word position, bigram probability, and also novel predictors, word and word sense (Chuang et al., 2024). Our analyses revealed that in spontaneous Taiwan Mandarin, T3-T3 words become indistinguishable from T2-T3 words, indicating complete sandhi, once the strong effect of word (or word sense) is taken into account. For our data, the shape of f0 contours is not co-determined by word frequency. In contrast, the effect of word meaning on f0 contours is robust, as strong as the effect of adjacent tones, and is present for both T2-T3 and T3-T3 words.
Paper Structure (20 sections, 10 figures, 5 tables)

This paper contains 20 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Example pitch contours for selected tokens of 了解 (liao3jie3, to know) (upper panels) and 媒體 (mei2ti3, media) (lower panels). xu_contextual_1997 observed for laboratory speech that the f0 contours of T2-T3 and T3-T3 words consist of a slight fall, followed by a rise, and then a fall. A similar pattern is visible for the tokens at the left hand side, but very different realizations are found in the remaining panels.
  • Figure 2: Plots of the partial effect of the three-way interaction of time by gender by tone pattern. Panel 1 and panel 2 shows the partial effect of tone pattern T2-T3 for female and male speakers respectively, and panel 3 and panel 4 show the partial effect of tone pattern T3-T3 for female and male speakers respectively. The dashed red vertical line indicates the average syllable boundary. The dashed red horizontal line indicates the x-axis.
  • Figure 3: Speech rate by gender. Left panel: partial effect for female speakers, center panel: partial effect for male speakers, right panel: interaction of speech rate by time, for both genders.
  • Figure 4: Plot of the partial effect of the tonal context. In the legend, tonal contexts are indicated by the preceding and following tones.
  • Figure 5: Partial effect of word position in utterance. Left panel: partial main effect; fright panel: interaction of word position by time.
  • ...and 5 more figures