Table of Contents
Fetching ...

Cross-Lingual IPA Contrastive Learning for Zero-Shot NER

Jimin Sohn, David R. Mortensen

TL;DR

The paper addresses zero-shot NER for low-resource languages by bridging cross-lingual phonemic gaps. It introduces CONLIPA, a dataset of English-IPA word pairs across 10 language families, and IPAC, a cross-lingual IPA contrastive learning objective that aligns phonemic representations using InfoNCE loss. The method is implemented on top of strong pre-trained models (e.g., XPhoneBERT) with a LoRA adapter and a projection layer, and is evaluated on WikiANN NER with ten high-resource languages for transfer. Results show consistent improvements over baselines in three zero-shot cases, demonstrating the effectiveness of phonemic alignment for cross-lingual generalization and offering a path toward improved NLP for low-resource languages.

Abstract

Existing approaches to zero-shot Named Entity Recognition (NER) for low-resource languages have primarily relied on machine translation, whereas more recent methods have shifted focus to phonemic representation. Building upon this, we investigate how reducing the phonemic representation gap in IPA transcription between languages with similar phonetic characteristics enables models trained on high-resource languages to perform effectively on low-resource languages. In this work, we propose CONtrastive Learning with IPA (CONLIPA) dataset containing 10 English and high resource languages IPA pairs from 10 frequently used language families. We also propose a cross-lingual IPA Contrastive learning method (IPAC) using the CONLIPA dataset. Furthermore, our proposed dataset and methodology demonstrate a substantial average gain when compared to the best performing baseline.

Cross-Lingual IPA Contrastive Learning for Zero-Shot NER

TL;DR

The paper addresses zero-shot NER for low-resource languages by bridging cross-lingual phonemic gaps. It introduces CONLIPA, a dataset of English-IPA word pairs across 10 language families, and IPAC, a cross-lingual IPA contrastive learning objective that aligns phonemic representations using InfoNCE loss. The method is implemented on top of strong pre-trained models (e.g., XPhoneBERT) with a LoRA adapter and a projection layer, and is evaluated on WikiANN NER with ten high-resource languages for transfer. Results show consistent improvements over baselines in three zero-shot cases, demonstrating the effectiveness of phonemic alignment for cross-lingual generalization and offering a path toward improved NLP for low-resource languages.

Abstract

Existing approaches to zero-shot Named Entity Recognition (NER) for low-resource languages have primarily relied on machine translation, whereas more recent methods have shifted focus to phonemic representation. Building upon this, we investigate how reducing the phonemic representation gap in IPA transcription between languages with similar phonetic characteristics enables models trained on high-resource languages to perform effectively on low-resource languages. In this work, we propose CONtrastive Learning with IPA (CONLIPA) dataset containing 10 English and high resource languages IPA pairs from 10 frequently used language families. We also propose a cross-lingual IPA Contrastive learning method (IPAC) using the CONLIPA dataset. Furthermore, our proposed dataset and methodology demonstrate a substantial average gain when compared to the best performing baseline.

Paper Structure

This paper contains 29 sections, 2 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Concept Figure. As shown in (A), existing phonemic models struggle to recognize the same word when IPA representations differ across languages, despite similar pronunciations. In contrast, our method (B) uses IPA contrastive learning to align representations of languages with similar pronunciations, particularly for high-resource languages. This enables effective zero-shot inference for low-resource languages, demonstrating strong generalization.
  • Figure 2: Samples in our CONLIPA dataset for each language.
  • Figure 3: Overall architecture of our IPA Contrastive Learning (IPAC). First, the IPA representations of word pairs with similar pronunciations are obtained from the phonemic encoder for two high-resource languages, such as English and Hindi. Then, these pairs are considered positive pairs, while the remaining samples in the batch are treated as negative pairs to compute the contrastive loss.
  • Figure 4: t-SNE (perplexity=2) visualization using 10 eng-ori pairs and 10 eng-khm pairs. Panels (a) and (c) represent the results before IPA contrastive learning, while panels (b) and (d) show the results after learning. Dots of the same color indicate pairs of english and target language words with the same meaning.
  • Figure 5: Ten eng-ori pairs with similar pronunciations
  • ...and 1 more figures