SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation

Chenming Tang; Zhixiang Wang; Yunfang Wu

SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation

Chenming Tang, Zhixiang Wang, Yunfang Wu

TL;DR

The paper tackles the dependency of in-context learning performance on the quality of MT demonstrations by introducing SCOI, a syntax-augmented selection strategy. SCOI computes set-level syntactic coverage via a simplified tree-to-polynomial representation and augments it with lexical word overlap to form an alternating, greedy selection process for in-context examples. A quadratic-time simplified tree-to-polynomial algorithm enables scalable syntactic encoding, and experiments across six translation directions with XGLM and Alpaca show SCOI achieving the highest average COMET scores among learning-free baselines, often surpassing a learning-based CTQ Scorer. The work demonstrates that incorporating syntactic information into ICL for MT yields meaningful gains and provides reproducible code and artifacts for further research.

Abstract

In-context learning (ICL) greatly improves the performance of large language models (LLMs) on various down-stream tasks, where the improvement highly depends on the quality of demonstrations. In this work, we introduce syntactic knowledge to select better in-context examples for machine translation (MT). We propose a new strategy, namely Syntax-augmented COverage-based In-context example selection (SCOI), leveraging the deep syntactic structure beyond conventional word matching. Specifically, we measure the set-level syntactic coverage by computing the coverage of polynomial terms with the help of a simplified tree-to-polynomial algorithm, and lexical coverage using word overlap. Furthermore, we devise an alternate selection approach to combine both coverage measures, taking advantage of syntactic and lexical information. We conduct experiments with two multi-lingual LLMs on six translation directions. Empirical results show that our proposed SCOI obtains the highest average COMET score among all learning-free methods, indicating that combining syntactic and lexical coverage successfully helps to select better in-context examples for MT. Our code is available at https://github.com/JamyDon/SCOI.

SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation

TL;DR

Abstract

Paper Structure (51 sections, 27 equations, 2 figures, 14 tables, 1 algorithm)

This paper contains 51 sections, 27 equations, 2 figures, 14 tables, 1 algorithm.

Introduction
Related Work
Method
Polynomial Representation of Syntactic Structure
Measure of Set-level Syntactic Coverage
Measure of Set-level Lexical Coverage
Combining Syntactic and Lexical Coverage
Experimental Setup
Datasets and Evaluation Metrics
Test Set
Example Database
Evaluation Metrics
Pre-processing
Large Language Models
Implementation Details
...and 36 more sections

Figures (2)

Figure 1: Overview of SCOI. Each example is selected based on how well the test input is covered by the current candidate plus the existing examples selected in previous steps at syntax level and word level alternately. In each step, $T$, $e_i$, $\oplus$, $c_i$, $S_i$ denote the test input, the $i$-th selected example, concatenation of selected examples and one candidate, the $i$-th candidate from the example database, the to-be-scored set including the selected examples plus the $i$-th candidate, respectively.
Figure 2: An example tree with $t+2$ layers and $4t+3$ nodes. $m^j_i$ denotes the $j$-th node on the $i$-th layer.

SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation

TL;DR

Abstract

SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)