Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning
Yi Shi, Congyi Wang, Yu Chen, Bin Wang
TL;DR
This work tackles Mandarin polyphone disambiguation within text-to-speech by introducing Semi-PPL, a semi-supervised learning framework that leverages large-scale unlabeled text to improve grapheme-to-phoneme disambiguation. The approach uses a compact base model built on tiny-Electra and a Conv-BLSTM classifier, enhanced with consistency regularization across text augmentations and entropy-driven pseudo labeling, including dictionary-assisted labeling for monophonic words. Experiments on a dedicated dataset (326K labeled training sentences, 1,100 challenging test sentences, and 25.4M unlabeled lines) show state-of-the-art accuracy with substantially reduced model complexity compared to Bert-based methods. The authors also publish a sizable labeled benchmark to promote further research and practical deployment in Mandarin G2P systems.
Abstract
The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations. As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates. This process is called Polyphone Disambiguation. Although the problem has been well explored with both knowledge-based and learning-based approaches, it remains challenging due to the lack of publicly available labeled datasets and the irregular nature of polyphone in Mandarin Chinese. In this paper, we propose a novel semi-supervised learning (SSL) framework for Mandarin Chinese polyphone disambiguation that can potentially leverage unlimited unlabeled text data. We explore the effect of various proxy labeling strategies including entropy-thresholding and lexicon-based labeling. Qualitative and quantitative experiments demonstrate that our method achieves state-of-the-art performance. In addition, we publish a novel dataset specifically for the polyphone disambiguation task to promote further research.
