Table of Contents
Fetching ...

Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing

Vilém Zouhar

TL;DR

This work investigates how subword vocabularies, particularly BPE, affect learning-based MT model stealing. It introduces a formal MT stealing setup, contrasting black-box and gray-box access, and evaluates how vocabulary choices influence student performance via BLEU. The findings indicate that the victim's BPE vocabulary has only a marginal impact on the stolen model's accuracy, while gray-box access enables efficient recovery of the victim's vocabulary with high overlap, highlighting security considerations for knowledge distillation. The results underscore that practical attacks can reconstruct vocabularies from outputs and that domain-aligned vocabularies are more important for efficiency than exact vocabulary replication, with broad implications for defenses against model stealing and distillation. All mathematical expressions are presented with proper delimiters to maintain precise representation of the underlying concepts, such as the BPE efficiency $\frac{|B_i(D_j)|}{|B_j(D_j)|}$ and vocabulary overlap $\frac{2|V\cap V'|}{|V|+|V'|}$.

Abstract

In learning-based functionality stealing, the attacker is trying to build a local model based on the victim's outputs. The attacker has to make choices regarding the local model's architecture, optimization method and, specifically for NLP models, subword vocabulary, such as BPE. On the machine translation task, we explore (1) whether the choice of the vocabulary plays a role in model stealing scenarios and (2) if it is possible to extract the victim's vocabulary. We find that the vocabulary itself does not have a large effect on the local model's performance. Given gray-box model access, it is possible to collect the victim's vocabulary by collecting the outputs (detokenized subwords on the output). The results of the minimum effect of vocabulary choice are important more broadly for black-box knowledge distillation.

Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing

TL;DR

This work investigates how subword vocabularies, particularly BPE, affect learning-based MT model stealing. It introduces a formal MT stealing setup, contrasting black-box and gray-box access, and evaluates how vocabulary choices influence student performance via BLEU. The findings indicate that the victim's BPE vocabulary has only a marginal impact on the stolen model's accuracy, while gray-box access enables efficient recovery of the victim's vocabulary with high overlap, highlighting security considerations for knowledge distillation. The results underscore that practical attacks can reconstruct vocabularies from outputs and that domain-aligned vocabularies are more important for efficiency than exact vocabulary replication, with broad implications for defenses against model stealing and distillation. All mathematical expressions are presented with proper delimiters to maintain precise representation of the underlying concepts, such as the BPE efficiency and vocabulary overlap .

Abstract

In learning-based functionality stealing, the attacker is trying to build a local model based on the victim's outputs. The attacker has to make choices regarding the local model's architecture, optimization method and, specifically for NLP models, subword vocabulary, such as BPE. On the machine translation task, we explore (1) whether the choice of the vocabulary plays a role in model stealing scenarios and (2) if it is possible to extract the victim's vocabulary. We find that the vocabulary itself does not have a large effect on the local model's performance. Given gray-box model access, it is possible to collect the victim's vocabulary by collecting the outputs (detokenized subwords on the output). The results of the minimum effect of vocabulary choice are important more broadly for black-box knowledge distillation.
Paper Structure (17 sections, 2 figures, 8 tables)

This paper contains 17 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Overlap with victim's vocabulary. Numbers in square brackets indicate the subword translation budget. Points $\blacksquare$ are a BPEs trained locally, $\CIRCLE$ are vocabularies collected from model output and $\blacktriangle$ are vocabularies collected from model output starting from a single sentence (max of 5).
  • Figure 2: Overlaps between vocabularies of different runs of the cyclic translations from single sentence algorithm (\ref{['lst:cyclic_vocab']}) and vocabulary sizes. Points show the average of 5 seeds and bars the maximum.