TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

Tran Nguyen Anh; Truong Dinh Dung; Vo Van Nam; Minh N. H. Nguyen

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

Tran Nguyen Anh, Truong Dinh Dung, Vo Van Nam, Minh N. H. Nguyen

TL;DR

A novel architecture for Vietnamese-English CS ASR, a Two-Stage Phoneme-Centric model (TSPC), which adopts a phoneme-centric approach based on an extended Vietnamese phoneme set as an intermediate representation for mixed-lingual modeling, while remaining efficient under low computational-resource constraints.

Abstract

Code-switching (CS) presents a significant challenge for general Auto-Speech Recognition (ASR) systems. Existing methods often fail to capture the sub tle phonological shifts inherent in CS scenarios. The challenge is particu larly difficult for language pairs like Vietnamese and English, where both distinct phonological features and the ambiguity arising from similar sound recognition are present. In this paper, we propose a novel architecture for Vietnamese-English CS ASR, a Two-Stage Phoneme-Centric model (TSPC). TSPC adopts a phoneme-centric approach based on an extended Vietnamese phoneme set as an intermediate representation for mixed-lingual modeling, while remaining efficient under low computational-resource constraints. Ex perimental results demonstrate that TSPC consistently outperforms exist ing baselines, including PhoWhisper-base, in Vietnamese-English CS ASR, achieving a significantly lower word error rate of 19.06% with reduced train ing resources. Furthermore, the phonetic-based two-stage architecture en ables phoneme adaptation and language conversion to enhance ASR perfor mance in complex CS Vietnamese-English ASR scenarios.

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

TL;DR

Abstract

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)