Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

Yan Gao; Zhiwei Cao; Zhongjian Miao; Baosong Yang; Shiyu Liu; Min Zhang; Jinsong Su

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su

TL;DR

This paper addresses the latency of per-timestep retrieval in non-parametric NMT by revealing limitations of lambda-based skipping in kNN-MT-AR and introducing kNN-MT-DR. The core contribution is a binary MLP classifier that explicitly decides whether to perform $k$NN retrieval at each decoding step, guided by carefully chosen scalar features and a timestep-aware threshold $\\alpha_t$. The approach yields substantial decoding-speed gains with minimal translation-quality loss and remains compatible with datastore compression and adaptive kNN frameworks. Empirical results on multi-domain German-English and cross-language tasks demonstrate improved efficiency and robust performance, making non-parametric domain adaptation more practical in diverse settings.

Abstract

To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $λ$. Despite its success, $k$NN retrieval at each timestep leads to substantial time overhead. To address this issue, dominant studies resort to $k$NN-MT with adaptive retrieval ($k$NN-MT-AR), which dynamically estimates $λ$ and skips $k$NN retrieval if $λ$ is less than a fixed threshold. Unfortunately, $k$NN-MT-AR does not yield satisfactory results. In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $λ$ for determining $k$NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps. To mitigate these limitations, we then propose $k$NN-MT with dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects. Firstly, we equip $k$NN-MT with a MLP-based classifier for determining whether to skip $k$NN retrieval at each timestep. Particularly, we explore several carefully-designed scalar features to fully exert the potential of the classifier. Secondly, we propose a timestep-aware threshold adjustment method to dynamically generate the threshold, which further improves the efficiency of our model. Experimental results on the widely-used datasets demonstrate the effectiveness and generality of our model.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/knn-mt-dr}.

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

TL;DR

NN retrieval at each decoding step, guided by carefully chosen scalar features and a timestep-aware threshold

. The approach yields substantial decoding-speed gains with minimal translation-quality loss and remains compatible with datastore compression and adaptive kNN frameworks. Empirical results on multi-domain German-English and cross-language tasks demonstrate improved efficiency and robust performance, making non-parametric domain adaptation more practical in diverse settings.

Abstract

To achieve non-parametric NMT domain adaptation,

-Nearest-Neighbor Machine Translation (

NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a

NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient

. Despite its success,

NN retrieval at each timestep leads to substantial time overhead. To address this issue, dominant studies resort to

NN-MT with adaptive retrieval (

NN-MT-AR), which dynamically estimates

and skips

NN retrieval if

is less than a fixed threshold. Unfortunately,

NN-MT-AR does not yield satisfactory results. In this paper, we first conduct a preliminary study to reveal two key limitations of

NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of

for determining

NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for

NN retrieval at different timesteps. To mitigate these limitations, we then propose

NN-MT with dynamic retrieval (

NN-MT-DR) that significantly extends vanilla

NN-MT in two aspects. Firstly, we equip

NN-MT with a MLP-based classifier for determining whether to skip

NN retrieval at each timestep. Particularly, we explore several carefully-designed scalar features to fully exert the potential of the classifier. Secondly, we propose a timestep-aware threshold adjustment method to dynamically generate the threshold, which further improves the efficiency of our model. Experimental results on the widely-used datasets demonstrate the effectiveness and generality of our model.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/knn-mt-dr}.

Paper Structure (34 sections, 4 equations, 2 figures, 13 tables)

This paper contains 34 sections, 4 equations, 2 figures, 13 tables.

Introduction
Related Work
Datastore Compression.
Retrieval Reduction.
Preliminary Study
Background
Datastore Construction.
Translating with Retrieved Pairs.
$k$NN-MT with Adaptive Retrieval
Limitations of $k$NN-MT-AR.
Our Model
Classifier for Determining $k$NN Retrieval Skipping
Construction of Training Samples
Input Features.
Classifier Training.
...and 19 more sections

Figures (2)

Figure 1: The changes of BLEU improvements between adjacent intervals. [0,5] means that $k$NN-MT only conducts retrieval when timestep ranges from 0 to 5. We only display the results for the first three BLEU improvements between adjacent intervals on the Subtitles, since the ratio of examples with length >= 25 is only about 1.35%.
Figure 2: Decoding speed(#Tok/Sec$\uparrow$) of Vanilla $k$NNMT and ours. Here, we set the batch size as $128$.

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

TL;DR

Abstract

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (2)