Self-Augmented In-Context Learning for Unsupervised Word Translation
Yaoyiran Li, Anna Korhonen, Ivan Vulić
TL;DR
This work tackles unsupervised bilingual lexicon induction by leveraging large language models more effectively. It introduces Self-Augmented In-Context Learning (SAIL), which iteratively extracts high-confidence word translation pairs via zero-shot prompts and refines them through few-shot in-context learning, then uses the resulting seed lexicon to improve translation inference. Across XLING and PanLex-BLI benchmarks, SAIL achieves state-of-the-art unsupervised BLI performance and consistently outperforms established mapping-based baselines, with strong statistical significance. The approach highlights the potential of self-augmented in-context strategies for cross-lingual lexicon induction, while acknowledging limitations related to language coverage and computational requirements; code is publicly available for reproducibility and further study.
Abstract
Recent work has shown that, while large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups, they still cannot match the performance of 'traditional' mapping-based approaches in the unsupervised scenario where no seed translation pairs are available, especially for lower-resource languages. To address this challenge with LLMs, we propose self-augmented in-context learning (SAIL) for unsupervised BLI: starting from a zero-shot prompt, SAIL iteratively induces a set of high-confidence word translation pairs for in-context learning (ICL) from an LLM, which it then reapplies to the same LLM in the ICL fashion. Our method shows substantial gains over zero-shot prompting of LLMs on two established BLI benchmarks spanning a wide range of language pairs, also outperforming mapping-based baselines across the board. In addition to achieving state-of-the-art unsupervised BLI performance, we also conduct comprehensive analyses on SAIL and discuss its limitations.
