Table of Contents
Fetching ...

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis

Minu Kim, Kangwook Jang, Hoirin Kim

TL;DR

This study provides an in-depth analysis of language selection, supported by a practical approach to assess phonetic proximity among multiple language families, and investigates how within-family similarity impacts performance in multilingual training, which aids in understanding language dynamics.

Abstract

This paper examines how linguistic similarity affects cross-lingual phonetic representation in speech processing for low-resource languages, emphasizing effective source language selection. Previous cross-lingual research has used various source languages to enhance performance for the target low-resource language without thorough consideration of selection. Our study stands out by providing an in-depth analysis of language selection, supported by a practical approach to assess phonetic proximity among multiple language families. We investigate how within-family similarity impacts performance in multilingual training, which aids in understanding language dynamics. We also evaluate the effect of using phonologically similar languages, regardless of family. For the phoneme recognition task, utilizing phonologically similar languages consistently achieves a relative improvement of 55.6% over monolingual training, even surpassing the performance of a large-scale self-supervised learning model. Multilingual training within the same language family demonstrates that higher phonological similarity enhances performance, while lower similarity results in degraded performance compared to monolingual training.

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis

TL;DR

This study provides an in-depth analysis of language selection, supported by a practical approach to assess phonetic proximity among multiple language families, and investigates how within-family similarity impacts performance in multilingual training, which aids in understanding language dynamics.

Abstract

This paper examines how linguistic similarity affects cross-lingual phonetic representation in speech processing for low-resource languages, emphasizing effective source language selection. Previous cross-lingual research has used various source languages to enhance performance for the target low-resource language without thorough consideration of selection. Our study stands out by providing an in-depth analysis of language selection, supported by a practical approach to assess phonetic proximity among multiple language families. We investigate how within-family similarity impacts performance in multilingual training, which aids in understanding language dynamics. We also evaluate the effect of using phonologically similar languages, regardless of family. For the phoneme recognition task, utilizing phonologically similar languages consistently achieves a relative improvement of 55.6% over monolingual training, even surpassing the performance of a large-scale self-supervised learning model. Multilingual training within the same language family demonstrates that higher phonological similarity enhances performance, while lower similarity results in degraded performance compared to monolingual training.
Paper Structure (10 sections, 3 equations, 5 figures, 2 tables)

This paper contains 10 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The Conformer-based model is trained to infer IPA sequences for each low-resource language, with the best results achieved using the top 3 most phonologically similar languages (i.e.$|\mathcal{S}|=3$ in Equation. (3a)) with the target language. The Phoneme-level Language Model (PLM) may or may not be used in decoding depending on experimental conditions.
  • Figure 2: The language similarity matrix measures cosine similarity between the 22 languages based on phoneme distribution. It provides pairwise similarity for each language, with each row representing its embedding.
  • Figure 3: (Left) Language family contours, based on phoneme distribution similarities, illustrate phonological relationships. (Right) Contour widths represent the first PCA component, showing the highest similarity among Turkic languages compared to other families.
  • Figure 4: Typological language distribution reflects similarity between languages and supports the corpus-based similarity assessment. The Turkic family shows the highest similarity, while other families exhibit lower similarity.
  • Figure 5: The training data amounts for three representative languages from each language family are presented. This includes the recorded data size of using all 22 languages, family-based languages, and the three closest languages. The overall trends are similar for other languages as well.