Table of Contents
Fetching ...

Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation

Owen Patterson, Chee Ng

TL;DR

In-context learning performance is highly sensitive to demonstration selection, particularly for machine translation. The authors introduce DiverseConE, a three-step pipeline that combines TopK similarity, contrastive example selection (ConE), and a diversity-enhancement step to produce diverse, informative demonstrations. Through extensive experiments with the Llama2-7b model on four language pairs in 1-shot and 3-shot settings, DiverseConE consistently outperforms baselines including random, BM25, TopK, and TopK+ConE, as measured by COMET20/COMET22; analyses of diversity and human evaluation corroborate the gains. The work highlights the importance of demonstration diversity in ICL for MT and suggests broader applicability to other tasks and models.

Abstract

In-Context Learning (ICL) empowers large language models to perform tasks by conditioning on a few input-output examples. However, the performance of ICL is highly sensitive to the selection of these demonstrations. While existing methods focus on similarity or contrastive selection, they often overlook the importance of diversity among the chosen examples. In this paper, we propose DiverseConE (Diversity-Enhanced Contrastive Example Selection), a novel approach for demonstration selection in in-context learning for machine translation. Our method builds upon contrastive selection by incorporating a diversity enhancement step based on embedding space dissimilarity. We conduct extensive experiments on the Llama2-7b model across four language pairs (English-Chinese, Chinese-English, Russian-German, German-Russian) in 1-shot and 3-shot settings, using COMET20 and COMET22 for evaluation. Our results demonstrate that DiverseConE consistently outperforms strong baseline methods, including random selection, BM25, TopK, and a state-of-the-art contrastive selection method. Further analysis, including diversity metrics and human evaluation, validates the effectiveness of our approach and highlights the benefits of considering demonstration diversity for improved translation quality.

Enhancing Contrastive Demonstration Selection with Semantic Diversity for Robust In-Context Machine Translation

TL;DR

In-context learning performance is highly sensitive to demonstration selection, particularly for machine translation. The authors introduce DiverseConE, a three-step pipeline that combines TopK similarity, contrastive example selection (ConE), and a diversity-enhancement step to produce diverse, informative demonstrations. Through extensive experiments with the Llama2-7b model on four language pairs in 1-shot and 3-shot settings, DiverseConE consistently outperforms baselines including random, BM25, TopK, and TopK+ConE, as measured by COMET20/COMET22; analyses of diversity and human evaluation corroborate the gains. The work highlights the importance of demonstration diversity in ICL for MT and suggests broader applicability to other tasks and models.

Abstract

In-Context Learning (ICL) empowers large language models to perform tasks by conditioning on a few input-output examples. However, the performance of ICL is highly sensitive to the selection of these demonstrations. While existing methods focus on similarity or contrastive selection, they often overlook the importance of diversity among the chosen examples. In this paper, we propose DiverseConE (Diversity-Enhanced Contrastive Example Selection), a novel approach for demonstration selection in in-context learning for machine translation. Our method builds upon contrastive selection by incorporating a diversity enhancement step based on embedding space dissimilarity. We conduct extensive experiments on the Llama2-7b model across four language pairs (English-Chinese, Chinese-English, Russian-German, German-Russian) in 1-shot and 3-shot settings, using COMET20 and COMET22 for evaluation. Our results demonstrate that DiverseConE consistently outperforms strong baseline methods, including random selection, BM25, TopK, and a state-of-the-art contrastive selection method. Further analysis, including diversity metrics and human evaluation, validates the effectiveness of our approach and highlights the benefits of considering demonstration diversity for improved translation quality.

Paper Structure

This paper contains 18 sections, 6 equations, 6 tables.