Table of Contents
Fetching ...

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su

TL;DR

This paper addresses the latency of per-timestep retrieval in non-parametric NMT by revealing limitations of lambda-based skipping in kNN-MT-AR and introducing kNN-MT-DR. The core contribution is a binary MLP classifier that explicitly decides whether to perform $k$NN retrieval at each decoding step, guided by carefully chosen scalar features and a timestep-aware threshold $\\alpha_t$. The approach yields substantial decoding-speed gains with minimal translation-quality loss and remains compatible with datastore compression and adaptive kNN frameworks. Empirical results on multi-domain German-English and cross-language tasks demonstrate improved efficiency and robust performance, making non-parametric domain adaptation more practical in diverse settings.

Abstract

To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $λ$. Despite its success, $k$NN retrieval at each timestep leads to substantial time overhead. To address this issue, dominant studies resort to $k$NN-MT with adaptive retrieval ($k$NN-MT-AR), which dynamically estimates $λ$ and skips $k$NN retrieval if $λ$ is less than a fixed threshold. Unfortunately, $k$NN-MT-AR does not yield satisfactory results. In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $λ$ for determining $k$NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps. To mitigate these limitations, we then propose $k$NN-MT with dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects. Firstly, we equip $k$NN-MT with a MLP-based classifier for determining whether to skip $k$NN retrieval at each timestep. Particularly, we explore several carefully-designed scalar features to fully exert the potential of the classifier. Secondly, we propose a timestep-aware threshold adjustment method to dynamically generate the threshold, which further improves the efficiency of our model. Experimental results on the widely-used datasets demonstrate the effectiveness and generality of our model.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/knn-mt-dr}.

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

TL;DR

This paper addresses the latency of per-timestep retrieval in non-parametric NMT by revealing limitations of lambda-based skipping in kNN-MT-AR and introducing kNN-MT-DR. The core contribution is a binary MLP classifier that explicitly decides whether to perform NN retrieval at each decoding step, guided by carefully chosen scalar features and a timestep-aware threshold . The approach yields substantial decoding-speed gains with minimal translation-quality loss and remains compatible with datastore compression and adaptive kNN frameworks. Empirical results on multi-domain German-English and cross-language tasks demonstrate improved efficiency and robust performance, making non-parametric domain adaptation more practical in diverse settings.

Abstract

To achieve non-parametric NMT domain adaptation, -Nearest-Neighbor Machine Translation (NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient . Despite its success, NN retrieval at each timestep leads to substantial time overhead. To address this issue, dominant studies resort to NN-MT with adaptive retrieval (NN-MT-AR), which dynamically estimates and skips NN retrieval if is less than a fixed threshold. Unfortunately, NN-MT-AR does not yield satisfactory results. In this paper, we first conduct a preliminary study to reveal two key limitations of NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of for determining NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for NN retrieval at different timesteps. To mitigate these limitations, we then propose NN-MT with dynamic retrieval (NN-MT-DR) that significantly extends vanilla NN-MT in two aspects. Firstly, we equip NN-MT with a MLP-based classifier for determining whether to skip NN retrieval at each timestep. Particularly, we explore several carefully-designed scalar features to fully exert the potential of the classifier. Secondly, we propose a timestep-aware threshold adjustment method to dynamically generate the threshold, which further improves the efficiency of our model. Experimental results on the widely-used datasets demonstrate the effectiveness and generality of our model.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/knn-mt-dr}.
Paper Structure (34 sections, 4 equations, 2 figures, 13 tables)

This paper contains 34 sections, 4 equations, 2 figures, 13 tables.

Figures (2)

  • Figure 1: The changes of BLEU improvements between adjacent intervals. [0,5] means that $k$NN-MT only conducts retrieval when timestep ranges from 0 to 5. We only display the results for the first three BLEU improvements between adjacent intervals on the Subtitles, since the ratio of examples with length >= 25 is only about 1.35%.
  • Figure 2: Decoding speed(#Tok/Sec$\uparrow$) of Vanilla $k$NNMT and ours. Here, we set the batch size as $128$.