Table of Contents
Fetching ...

PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs

Sana Kang, Myeongseok Gwon, Su Young Kwon, Jaewook Lee, Andrew Lan, Bhiksha Raj, Rita Singh

TL;DR

PhoniTale is presented, a novel cross-lingual mnemonic generation system that performs IPA-based phonological adaptation and syllable-aware alignment to retrieve L1 keyword sequence and uses LLMs to generate verbal cues.

Abstract

Vocabulary acquisition poses a significant challenge for second-language (L2) learners, especially when learning typologically distant languages such as English and Korean, where phonological and structural mismatches complicate vocabulary learning. Recently, large language models (LLMs) have been used to generate keyword mnemonics by leveraging similar keywords from a learner's first language (L1) to aid in acquiring L2 vocabulary. However, most methods still rely on direct IPA-based phonetic matching or employ LLMs without phonological guidance. In this paper, we present PhoniTale, a novel cross-lingual mnemonic generation system that performs IPA-based phonological adaptation and syllable-aware alignment to retrieve L1 keyword sequence and uses LLMs to generate verbal cues. We evaluate PhoniTale through automated metrics and a short-term recall test with human participants, comparing its output to human-written and prior automated mnemonics. Our findings show that PhoniTale consistently outperforms previous automated approaches and achieves quality comparable to human-written mnemonics.

PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs

TL;DR

PhoniTale is presented, a novel cross-lingual mnemonic generation system that performs IPA-based phonological adaptation and syllable-aware alignment to retrieve L1 keyword sequence and uses LLMs to generate verbal cues.

Abstract

Vocabulary acquisition poses a significant challenge for second-language (L2) learners, especially when learning typologically distant languages such as English and Korean, where phonological and structural mismatches complicate vocabulary learning. Recently, large language models (LLMs) have been used to generate keyword mnemonics by leveraging similar keywords from a learner's first language (L1) to aid in acquiring L2 vocabulary. However, most methods still rely on direct IPA-based phonetic matching or employ LLMs without phonological guidance. In this paper, we present PhoniTale, a novel cross-lingual mnemonic generation system that performs IPA-based phonological adaptation and syllable-aware alignment to retrieve L1 keyword sequence and uses LLMs to generate verbal cues. We evaluate PhoniTale through automated metrics and a short-term recall test with human participants, comparing its output to human-written and prior automated mnemonics. Our findings show that PhoniTale consistently outperforms previous automated approaches and achieves quality comparable to human-written mnemonics.

Paper Structure

This paper contains 63 sections, 4 equations, 10 figures, 13 tables, 1 algorithm.

Figures (10)

  • Figure 1: Problem formulation of the PhoniTale system. Phase 1, keyword sequence retrieval, comprises (a) IPA transliteration, (b) segmentation, and (c) keyword matching. Phase 2, (d), performs verbal cue generation.
  • Figure 2: Visualization of the predicted syllable sequence of the English word autopsy. For /othapsi/, the model assigns high boundary probabilities after o, p, and i, segmenting the sequence into [o, thap, si].
  • Figure 3: Mean correctness scores by participant group. Error bars indicate standard error.
  • Figure 4: Four key challenges in English-Korean phonological alignment: (1) Dimensional structure mismatch: Korean's two-dimensional syllabic blocks versus English's linear sequence. (2) Syllable expansion due to consonant cluster resolution. (3) Phoneme transformation: Korean lacks certain English distinctions while English lacks Korean's three-way consonant contrast. (4) Phonemic contrast differences: Korean's systematic three-way distinction versus English's position-dependent allophones.
  • Figure 5: We demonstrate this two-phase pipeline through a running example of $w_{\mathrm{L2}}$ "squander". The system first converts the word into $P_{\mathrm{L2}}$ (/sk"wand r/) using the eng-to-ipa library eng_to_ipa, which is based on the CMU Pronouncing Dictionary cmu_dictionary. The system then generates $\widehat{P}_{\mathrm{L1}}$ (/sWkhwant/), predicts syllable sequence (/sW/, /khwan/, /t/), and derives the segments (/sWkhwan/, /t/). The system retrieves $\mathcal{W}_{\mathrm{L1}}$ with IPA transcriptions /sæ .gwan/ and /t/, and uses them to construct a verbal cue: "sE.gwan E.s si.gan. l. tnaN.bi.Et.t*a" (English translation: Wastedmore time at customs).
  • ...and 5 more figures