Table of Contents
Fetching ...

Generation with Dynamic Vocabulary

Yanting Liu, Tao Ji, Changzhi Sun, Yuanbin Wu, Xiaoling Wang

TL;DR

The ability to generate multi-tokens atomically improve both generation quality and efficiency and can be deployed in a plug-and-play way, thus is attractive for various downstream applications.

Abstract

We introduce a new dynamic vocabulary for language models. It can involve arbitrary text spans during generation. These text spans act as basic generation bricks, akin to tokens in the traditional static vocabularies. We show that, the ability to generate multi-tokens atomically improve both generation quality and efficiency (compared to the standard language model, the MAUVE metric is increased by 25%, the latency is decreased by 20%). The dynamic vocabulary can be deployed in a plug-and-play way, thus is attractive for various downstream applications. For example, we demonstrate that dynamic vocabulary can be applied to different domains in a training-free manner. It also helps to generate reliable citations in question answering tasks (substantially enhancing citation results without compromising answer accuracy).

Generation with Dynamic Vocabulary

TL;DR

The ability to generate multi-tokens atomically improve both generation quality and efficiency and can be deployed in a plug-and-play way, thus is attractive for various downstream applications.

Abstract

We introduce a new dynamic vocabulary for language models. It can involve arbitrary text spans during generation. These text spans act as basic generation bricks, akin to tokens in the traditional static vocabularies. We show that, the ability to generate multi-tokens atomically improve both generation quality and efficiency (compared to the standard language model, the MAUVE metric is increased by 25%, the latency is decreased by 20%). The dynamic vocabulary can be deployed in a plug-and-play way, thus is attractive for various downstream applications. For example, we demonstrate that dynamic vocabulary can be applied to different domains in a training-free manner. It also helps to generate reliable citations in question answering tasks (substantially enhancing citation results without compromising answer accuracy).

Paper Structure

This paper contains 37 sections, 3 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Generation with dynamic vocabulary. The model's vocabulary dynamically changes based on the input text, with phrases serving as basic blocks both for input and output.
  • Figure 2: The overall architecture of our proposed dynamic vocabulary. During training, there are four sources of negative phrases: pre-batch, corpus-retrieval, self-retrieval, and generation. Phrases are embedded by the dynamic phrase encoder with an additional linear layer. The hidden layer of the last token serves as the phrase embedding. In the model input layer, phrases are treated as a basic brick without splitting into tokens.
  • Figure 3: A comparison between texts generated by our proposed model and GPT-2. The tokens highlighted in blue are from dynamic vocabulary while others are from fixed token ones.
  • Figure 4: A comparation between texts generated by our proposed model and GPT-2. The tokens highlighted in blue are from dynamic vocabulary while others are from fixed token ones.
  • Figure 5: A comparation between texts generated by our proposed model and GPT-2. The tokens highlighted in blue are from dynamic vocabulary while others are from fixed token ones.
  • ...and 1 more figures