SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
Chenming Tang, Zhixiang Wang, Yunfang Wu
TL;DR
The paper tackles the dependency of in-context learning performance on the quality of MT demonstrations by introducing SCOI, a syntax-augmented selection strategy. SCOI computes set-level syntactic coverage via a simplified tree-to-polynomial representation and augments it with lexical word overlap to form an alternating, greedy selection process for in-context examples. A quadratic-time simplified tree-to-polynomial algorithm enables scalable syntactic encoding, and experiments across six translation directions with XGLM and Alpaca show SCOI achieving the highest average COMET scores among learning-free baselines, often surpassing a learning-based CTQ Scorer. The work demonstrates that incorporating syntactic information into ICL for MT yields meaningful gains and provides reproducible code and artifacts for further research.
Abstract
In-context learning (ICL) greatly improves the performance of large language models (LLMs) on various down-stream tasks, where the improvement highly depends on the quality of demonstrations. In this work, we introduce syntactic knowledge to select better in-context examples for machine translation (MT). We propose a new strategy, namely Syntax-augmented COverage-based In-context example selection (SCOI), leveraging the deep syntactic structure beyond conventional word matching. Specifically, we measure the set-level syntactic coverage by computing the coverage of polynomial terms with the help of a simplified tree-to-polynomial algorithm, and lexical coverage using word overlap. Furthermore, we devise an alternate selection approach to combine both coverage measures, taking advantage of syntactic and lexical information. We conduct experiments with two multi-lingual LLMs on six translation directions. Empirical results show that our proposed SCOI obtains the highest average COMET score among all learning-free methods, indicating that combining syntactic and lexical coverage successfully helps to select better in-context examples for MT. Our code is available at https://github.com/JamyDon/SCOI.
