Table of Contents
Fetching ...

Compositional Phoneme Approximation for L1-Grounded L2 Pronunciation Training

Jisang Park, Minu Kim, DaYoung Hong, Jongha Lee

TL;DR

The paper tackles the problem that L2 learners map unfamiliar phonemes to their L1 categories, hindering efficient pronunciation training. It introduces compositional phoneme approximation (CPA), a feature-space, L1-grounded method that composes L2 phonemes from sequences of L1 phoneme features, with vowels built from two L1 vowels and consonants from base L1 segments plus context-driven changes. In a 10-minute training with 20 Korean English learners across 18 target items, CPA achieves a $76.0\%$ in-box formant rate, a $53.4\%$ phoneme-recognition accuracy, and about $80\%$ word-level nativeness in comparative judgments, indicating robust, rapid gains. These results suggest that leveraging L1 articulatory knowledge through CPA can accelerate early-stage L2 pronunciation learning, though extending to suprasegmentals and cross-script adaptation remains for future work.

Abstract

Learners of a second language (L2) often map non-native phonemes to similar native-language (L1) phonemes, making conventional L2-focused training slow and effortful. To address this, we propose an L1-grounded pronunciation training method based on compositional phoneme approximation (CPA), a feature-based representation technique that approximates L2 sounds with sequences of L1 phonemes. Evaluations with 20 Korean non-native English speakers show that CPA-based training achieves a 76% in-box formant rate in acoustic analysis, 17.6% relative improvement in phoneme recognition accuracy, and over 80% of speech being rated as more native-like, with minimal training. Project page: https://gsanpark.github.io/CPA-Pronunciation.

Compositional Phoneme Approximation for L1-Grounded L2 Pronunciation Training

TL;DR

The paper tackles the problem that L2 learners map unfamiliar phonemes to their L1 categories, hindering efficient pronunciation training. It introduces compositional phoneme approximation (CPA), a feature-space, L1-grounded method that composes L2 phonemes from sequences of L1 phoneme features, with vowels built from two L1 vowels and consonants from base L1 segments plus context-driven changes. In a 10-minute training with 20 Korean English learners across 18 target items, CPA achieves a in-box formant rate, a phoneme-recognition accuracy, and about word-level nativeness in comparative judgments, indicating robust, rapid gains. These results suggest that leveraging L1 articulatory knowledge through CPA can accelerate early-stage L2 pronunciation learning, though extending to suprasegmentals and cross-script adaptation remains for future work.

Abstract

Learners of a second language (L2) often map non-native phonemes to similar native-language (L1) phonemes, making conventional L2-focused training slow and effortful. To address this, we propose an L1-grounded pronunciation training method based on compositional phoneme approximation (CPA), a feature-based representation technique that approximates L2 sounds with sequences of L1 phonemes. Evaluations with 20 Korean non-native English speakers show that CPA-based training achieves a 76% in-box formant rate in acoustic analysis, 17.6% relative improvement in phoneme recognition accuracy, and over 80% of speech being rated as more native-like, with minimal training. Project page: https://gsanpark.github.io/CPA-Pronunciation.

Paper Structure

This paper contains 25 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Compositional phoneme approximation represents L2 phonemes absent in the learner’s L1 as composite sounds derived from multiple L1 phonemes.
  • Figure 2: (a) An L2 vowel is approximated by combining two L1 vowels whose features jointly mirror the phonological identity of the target vowel. (b) An L2 consonant is approximated by inserting one or two L1 segments, forming allophones that more closely match the phonological features of the target consonant.
  • Figure 3: Vowel production and formant trajectories for /æ/, /O/, and /ə/. Top: Distributions of speaker productions across conditions (ENG, KOR, CPA), with in-box rates (%). Red boxes show target F1–F2 regions; gray trapezoids indicate canonical vowel space. Bottom: CPA productions shown with spectrograms and smoothed F1 (red) and F2 (blue) trajectories. Shaded bands indicate target formant ranges; arrows show intended transitions.
  • Figure 4: LLM-based word-level nativeness comparison: (a) CPA vs. ENG and (b) CPA vs. KOR. Each cell summarizes the CPA win rate (%) from 18 pairwise comparisons per word and participant. Bars show average win rates across words and participants.
  • Figure 5: An instructional slide for reading CPA-based Korean graphemes used in a 10-minute training session.