An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference

Atsuki Yamaguchi; Aline Villavicencio; Nikolaos Aletras

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference

Atsuki Yamaguchi, Aline Villavicencio, Nikolaos Aletras

TL;DR

An empirical study of five CVA methods on four generative LLMs across four typologically-diverse languages and four natural language understanding tasks finds that CVA substantially contributes to LLM inference speedups of up to 271.5\%.

Abstract

The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation (CVA) methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of five CVA methods on four generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that CVA substantially contributes to LLM inference speedups of up to 271.5\%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference

TL;DR

Abstract

Paper Structure (56 sections, 10 figures, 13 tables)

This paper contains 56 sections, 10 figures, 13 tables.

Introduction
Related Work
Impact of Tokenization on LLMs
Cross-lingual Vocabulary Adaptation
Cross-lingual Vocabulary Adaptation
Problem Setting
Target Vocabulary Initialization Methods
Random.
Cross-lingual and Progressive Initialization (CLP).
Heuristics.
FOCUS.
CLP+.
Experimental Setup
Source Models
Target Languages and Adaptation Data
...and 41 more sections

Figures (10)

Figure 1: Example of overfragmentation when applying the Mistral-7B tokenizer to non-English text.
Figure 2: Relative speedup ratios to each base model/tokenizer when prompted in English and a target language. Dotted lines denote the average speedup ratio across tasks in each setting.
Figure 3: Performance difference between English and in-language prompts. Positive and negative values indicate better performance using in-language or English prompts respectively.
Figure 4: Kendall's $\tau$ correlation between the number of LAPT steps and performance (in-language prompting).
Figure 5: Performance changes in span with respect to LoRA rank $r$.
...and 5 more figures

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference

TL;DR

Abstract

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (10)