Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

David Demitri Africa; Suchir Salhan; Yuval Weiss; Paula Buttery; Richard Diehl Martinez

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

David Demitri Africa, Suchir Salhan, Yuval Weiss, Paula Buttery, Richard Diehl Martinez

TL;DR

This paper tackles zero-shot cross-lingual NER for low-resource Philippine languages by meta-pretraining small decoder LMs with a first-order MAML objective, aiming to produce fast-adapting representations without exposure to Tagalog or Cebuano. The authors implement a hybrid pretraining regime on Pico decoders across four sizes and attach an untrained CRF head for high-resource finetuning before zero-shot evaluation on Tagalog and Cebuano. They report consistent zero-shot micro-F1 gains (2–6 points head-only, 1–3 points full-tuning), with the largest improvements observed for single-token person entities and in surface-anchored cues like Tagalog case particles; gains are more pronounced at smaller models and tend to diminish with scale. Qualitative analyses reveal that meta-pretraining sharpens lexical prototypes and enhances reliance on surface cues, while also identifying limitations related to multi-token entities and capacity constraints, suggesting avenues for broader language coverage and alternative meta-objectives.

Abstract

Named-entity recognition (NER) in low-resource languages is usually tackled by finetuning very large multilingual LMs, an option that is often infeasible in memory- or latency-constrained settings. We ask whether small decoder LMs can be pretrained so that they adapt quickly and transfer zero-shot to languages unseen during pretraining. To this end we replace part of the autoregressive objective with first-order model-agnostic meta-learning (MAML). Tagalog and Cebuano are typologically similar yet structurally different in their actor/non-actor voice systems, and hence serve as a challenging test-bed. Across four model sizes (11 M - 570 M) MAML lifts zero-shot micro-F1 by 2-6 pp under head-only tuning and 1-3 pp after full tuning, while cutting convergence time by up to 8%. Gains are largest for single-token person entities that co-occur with Tagalog case particles si/ni, highlighting the importance of surface anchors.

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

TL;DR

Abstract

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)