Table of Contents
Fetching ...

Methods of Automatic Matrix Language Determination for Code-Switched Speech

Olga Iakovenko, Thomas Hain

TL;DR

This work addresses automatic determination of the Matrix Language in code-switched utterances by leveraging Matrix Language Frame theory to derive three MLID predictors from text (P1.1, P1.2, P2) and three from audio (MLID_P1.1, MLID_P1.2, MLID_P2). It shows that audio-based MLID aligns more with the textual principles than conventional acoustic LID and can outperform LID in MLID tasks, achieving notable F1 macro and MCC gains. Through extensive correlation analyses across SEAME and Miami CS data, the study reveals that non-English languages (Mandarin/Spanish) more often serve as the matrix language, even as utterance-level LID remains English-dominated. The findings suggest MLID-informed analysis can improve CS processing and downstream NLP/ASR applications, though progress is limited by data availability and full coverage of the proposed principles. Overall, the paper provides a principled framework for automatic ML determination with demonstrated advantages over traditional LID in CS contexts, and lays groundwork for further integration into multilingual modeling tasks.

Abstract

Code-switching (CS) is the process of speakers interchanging between two or more languages which in the modern world becomes increasingly common. In order to better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language, which is the language that provides the grammatical structure for a CS utterance. In this work the MLF theory was used to develop systems for Matrix Language Identity (MLID) determination. The MLID of English/Mandarin and English/Spanish CS text and speech was compared to acoustic language identity (LID), which is a typical way to identify a language in monolingual utterances. MLID predictors from audio show higher correlation with the textual principles than LID in all cases while also outperforming LID in an MLID recognition task based on F1 macro (60%) and correlation score (0.38). This novel approach has identified that non-English languages (Mandarin and Spanish) are preferred over the English language as the ML contrary to the monolingual choice of LID.

Methods of Automatic Matrix Language Determination for Code-Switched Speech

TL;DR

This work addresses automatic determination of the Matrix Language in code-switched utterances by leveraging Matrix Language Frame theory to derive three MLID predictors from text (P1.1, P1.2, P2) and three from audio (MLID_P1.1, MLID_P1.2, MLID_P2). It shows that audio-based MLID aligns more with the textual principles than conventional acoustic LID and can outperform LID in MLID tasks, achieving notable F1 macro and MCC gains. Through extensive correlation analyses across SEAME and Miami CS data, the study reveals that non-English languages (Mandarin/Spanish) more often serve as the matrix language, even as utterance-level LID remains English-dominated. The findings suggest MLID-informed analysis can improve CS processing and downstream NLP/ASR applications, though progress is limited by data availability and full coverage of the proposed principles. Overall, the paper provides a principled framework for automatic ML determination with demonstrated advantages over traditional LID in CS contexts, and lays groundwork for further integration into multilingual modeling tasks.

Abstract

Code-switching (CS) is the process of speakers interchanging between two or more languages which in the modern world becomes increasingly common. In order to better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language, which is the language that provides the grammatical structure for a CS utterance. In this work the MLF theory was used to develop systems for Matrix Language Identity (MLID) determination. The MLID of English/Mandarin and English/Spanish CS text and speech was compared to acoustic language identity (LID), which is a typical way to identify a language in monolingual utterances. MLID predictors from audio show higher correlation with the textual principles than LID in all cases while also outperforming LID in an MLID recognition task based on F1 macro (60%) and correlation score (0.38). This novel approach has identified that non-English languages (Mandarin and Spanish) are preferred over the English language as the ML contrary to the monolingual choice of LID.
Paper Structure (18 sections, 5 equations, 4 figures, 11 tables)

This paper contains 18 sections, 5 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Detection error tradeoff (DET) curve for possible $\log\alpha$ values. Thin diamond is the default value of $\log\alpha=0$, thick diamond - result of $\log\alpha$ estimation, red star - ground truth $\log\alpha$.
  • Figure 2: Pipeline of the morpheme order-based principle for ML determination P1.2.
  • Figure 3: Correlations between acoustic $LID$ and $MLID$ outputs and textual P1.1, P1.2 and P2 for CS SEAME data. Each bar segment represents the amount of correlation for a LID or MLID model with textual principles, therefore the whole bar represents the sum of the correlations.
  • Figure 4: Correlations between acoustic $LID$ and $MLID$ outputs and textual P1.1, P1.2 and P2 for CS Miami data.