Fast Vocabulary Transfer for Language Model Compression
Leonidas Gee, Andrea Zugarini, Leonardo Rigutini, Paolo Torroni
TL;DR
The paper tackles the high cost of large pre-trained language models by introducing Fast Vocabulary Transfer (FVT), a lightweight method to adapt general-domain LMs to smaller, in-domain tokenizers. By initializing in-domain embeddings from a general LM and then fine-tuning with masked language modeling and downstream tasks, FVT reduces model size and speeds up inference while preserving performance, particularly in specialized domains like medicine and law. The study demonstrates that FVT is complementary to knowledge distillation (KD), enabling further compression up to approximately 2.75x without substantial accuracy loss. Overall, VT (and specifically FVT) offers a practical, orthogonal avenue for scalable model deployment across vertical domains, with potential for deeper integration with KD in future work.
Abstract
Real-world business applications require a trade-off between language model performance and size. We propose a new method for model compression that relies on vocabulary transfer. We evaluate the method on various vertical domains and downstream tasks. Our results indicate that vocabulary transfer can be effectively used in combination with other compression techniques, yielding a significant reduction in model size and inference time while marginally compromising on performance.
