Table of Contents
Fetching ...

Automated Tone Transcription and Clustering with Tone2Vec

Yi Yang, Yiming Wang, ZhiQiang Tang, Jiahong Yuan

TL;DR

Tone2Vec addresses the high cost of lexical tone transcription in Sino-Tibetan languages by converting discrete tone transcriptions into pitch-based, continuous representations. It introduces a novel pitch-curve representation, a pitch-aware loss for automatic transcription, and clustering approaches, all integrated in the open-source ToneLab platform. Empirical results show Tone2Vec improves dialect clustering accuracy and transcription performance with relatively small datasets, supporting scalable cross-dialect tonal analysis and fieldwork. This work offers a practical tool and methodological advances to preserve endangered tonal languages while enabling robust linguistic inquiry.

Abstract

Lexical tones play a crucial role in Sino-Tibetan languages. However, current phonetic fieldwork relies on manual effort, resulting in substantial time and financial costs. This is especially challenging for the numerous endangered languages that are rapidly disappearing, often compounded by limited funding. In this paper, we introduce pitch-based similarity representations for tone transcription, named Tone2Vec. Experiments on dialect clustering and variance show that Tone2Vec effectively captures fine-grained tone variation. Utilizing Tone2Vec, we develop the first automatic approach for tone transcription and clustering by presenting a novel representation transformation for transcriptions. Additionally, these algorithms are systematically integrated into an open-sourced and easy-to-use package, ToneLab, which facilitates automated fieldwork and cross-regional, cross-lexical analysis for tonal languages. Extensive experiments were conducted to demonstrate the effectiveness of our methods.

Automated Tone Transcription and Clustering with Tone2Vec

TL;DR

Tone2Vec addresses the high cost of lexical tone transcription in Sino-Tibetan languages by converting discrete tone transcriptions into pitch-based, continuous representations. It introduces a novel pitch-curve representation, a pitch-aware loss for automatic transcription, and clustering approaches, all integrated in the open-source ToneLab platform. Empirical results show Tone2Vec improves dialect clustering accuracy and transcription performance with relatively small datasets, supporting scalable cross-dialect tonal analysis and fieldwork. This work offers a practical tool and methodological advances to preserve endangered tonal languages while enabling robust linguistic inquiry.

Abstract

Lexical tones play a crucial role in Sino-Tibetan languages. However, current phonetic fieldwork relies on manual effort, resulting in substantial time and financial costs. This is especially challenging for the numerous endangered languages that are rapidly disappearing, often compounded by limited funding. In this paper, we introduce pitch-based similarity representations for tone transcription, named Tone2Vec. Experiments on dialect clustering and variance show that Tone2Vec effectively captures fine-grained tone variation. Utilizing Tone2Vec, we develop the first automatic approach for tone transcription and clustering by presenting a novel representation transformation for transcriptions. Additionally, these algorithms are systematically integrated into an open-sourced and easy-to-use package, ToneLab, which facilitates automated fieldwork and cross-regional, cross-lexical analysis for tonal languages. Extensive experiments were conducted to demonstrate the effectiveness of our methods.
Paper Structure (26 sections, 7 equations, 13 figures, 7 tables)

This paper contains 26 sections, 7 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Overview of our proposed methods. From left to right: Tone2Vec module for representations, Transcription module for automated tone transcription, and Clustering module for clustering tonal data.
  • Figure 2: Fundamental frequency (F0, represented with solid lines) and transcription (e.g., (55) indicating a High tone) for the four basic Mandarin tones.
  • Figure 3: Left: Visual simulations using transcription sequences $l_{1}$ = (55) (green linear curve), $l_{2}$ = (41) (red linear curve), and $l_{3}$ = (312) (blue quadratic curve). Grey shading denotes the area between (41) and (312). Right: The number 2.27 with grey shading represents the calculated distance between (41) and (312).
  • Figure : (a) Gold-standard
  • Figure : (a) Tone2Vec
  • ...and 8 more figures