Table of Contents
Fetching ...

TreePrompt: Leveraging Hierarchical Few-Shot Example Selection for Improved English-Persian and English-German Translation

Ramtin Kakavand, Ebrahim Ansari

TL;DR

TreePrompt advances few-shot machine translation by incorporating LLM-driven quality judgments into a hierarchical, tree-structured example selection process. It combines LLM preference labeling with KNN-based semantic expansion over RoBERTa embeddings, and augments this with AFSP or KNN to balance quality and similarity. Evaluations on MIZAN (Persian–English) and WMT19 (English–German) show that TreePrompt can outperform baseline prompts, often with fewer, higher-quality examples, though results vary by language and model and COMET can be negative for low-resource pairs. The approach highlights the practical potential of model-guided example selection to improve translation quality in low-resource settings, while pointing to computational considerations and avenues for further validation with larger datasets and human judgments.

Abstract

Large Language Models (LLMs) have consistently demonstrated strong performance in machine translation, especially when guided by high-quality prompts. Few-shot prompting is an effective technique to improve translation quality; however, most existing example selection methods focus solely on query-to-example similarity and do not account for the quality of the examples. In this work, we propose TreePrompt, a novel example selection approach that learns LLM preferences to identify high-quality, contextually relevant examples within a tree-structured framework. To further explore the balance between similarity and quality, we combine TreePrompt with K-Nearest Neighbors (K-NN) and Adaptive Few-Shot Prompting (AFSP). Evaluations on two language pairs - English-Persian (MIZAN) and English-German (WMT19) - show that integrating TreePrompt with AFSP or Random selection leads to improved translation performance.

TreePrompt: Leveraging Hierarchical Few-Shot Example Selection for Improved English-Persian and English-German Translation

TL;DR

TreePrompt advances few-shot machine translation by incorporating LLM-driven quality judgments into a hierarchical, tree-structured example selection process. It combines LLM preference labeling with KNN-based semantic expansion over RoBERTa embeddings, and augments this with AFSP or KNN to balance quality and similarity. Evaluations on MIZAN (Persian–English) and WMT19 (English–German) show that TreePrompt can outperform baseline prompts, often with fewer, higher-quality examples, though results vary by language and model and COMET can be negative for low-resource pairs. The approach highlights the practical potential of model-guided example selection to improve translation quality in low-resource settings, while pointing to computational considerations and avenues for further validation with larger datasets and human judgments.

Abstract

Large Language Models (LLMs) have consistently demonstrated strong performance in machine translation, especially when guided by high-quality prompts. Few-shot prompting is an effective technique to improve translation quality; however, most existing example selection methods focus solely on query-to-example similarity and do not account for the quality of the examples. In this work, we propose TreePrompt, a novel example selection approach that learns LLM preferences to identify high-quality, contextually relevant examples within a tree-structured framework. To further explore the balance between similarity and quality, we combine TreePrompt with K-Nearest Neighbors (K-NN) and Adaptive Few-Shot Prompting (AFSP). Evaluations on two language pairs - English-Persian (MIZAN) and English-German (WMT19) - show that integrating TreePrompt with AFSP or Random selection leads to improved translation performance.

Paper Structure

This paper contains 30 sections, 5 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: An overview of the proposed Tree-based example selection approach for few-shot translation prompting
  • Figure 2: Compact prompt used for scoring translation examples
  • Figure 3: TreePrompt: Tree-Based Example Selection Algorithm
  • Figure 4: Prompt used for few-shot translation
  • Figure 5: Evaluation results for GPT-4o, GPT-3.5 Turbo, and DeepSeek across different prompting methods in the MIZAN dataset, English-Persian sorted by COMET score (primary metric) The highest scores and methods are bolded.
  • ...and 1 more figures