Table of Contents
Fetching ...

Monotonic Paraphrasing Improves Generalization of Language Model Prompting

Qin Liu, Fei Wang, Nan Xu, Tianyi Yan, Tao Meng, Muhao Chen

TL;DR

MonoPara is proposed, an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM that constrains the generation for lower perplexity.

Abstract

Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phrases. In this paper, we propose monotonic paraphrasing (MonoPara), an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM (i.e. the prompt or instruction executor) that constrains the generation for lower perplexity. The ensemble decoding process can efficiently paraphrase the original prompt without altering its semantic meaning, while monotonically decreasing the perplexity of each generation as calculated by the target LM. We explore in detail both greedy and search-based decoding as two alternative decoding schemes of MonoPara. Notably, MonoPara does not require any training and can monotonically lower the perplexity of the paraphrased prompt or instruction, leading to improved performance of zero-shot LM prompting as evaluated on a wide selection of tasks. In addition, MonoPara is also shown to effectively improve LMs' generalization on perturbed and unseen task instructions.

Monotonic Paraphrasing Improves Generalization of Language Model Prompting

TL;DR

MonoPara is proposed, an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM that constrains the generation for lower perplexity.

Abstract

Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phrases. In this paper, we propose monotonic paraphrasing (MonoPara), an end-to-end decoding strategy that paraphrases given prompts or instructions into their lower perplexity counterparts based on an ensemble of a paraphrase LM for prompt (or instruction) rewriting, and a target LM (i.e. the prompt or instruction executor) that constrains the generation for lower perplexity. The ensemble decoding process can efficiently paraphrase the original prompt without altering its semantic meaning, while monotonically decreasing the perplexity of each generation as calculated by the target LM. We explore in detail both greedy and search-based decoding as two alternative decoding schemes of MonoPara. Notably, MonoPara does not require any training and can monotonically lower the perplexity of the paraphrased prompt or instruction, leading to improved performance of zero-shot LM prompting as evaluated on a wide selection of tasks. In addition, MonoPara is also shown to effectively improve LMs' generalization on perturbed and unseen task instructions.
Paper Structure (32 sections, 4 equations, 3 figures, 9 tables, 2 algorithms)

This paper contains 32 sections, 4 equations, 3 figures, 9 tables, 2 algorithms.

Figures (3)

  • Figure 1: Perplexity of $x_{para}$ as the output paraphrase of $P_{para}$ vs. as the input prompt of $P_{tar}$ for the AG News dataset with Mistral 7B as both $P_{para}$ and $P_{tar}$. Each point stands for a different prompt $x_{para}$. A low-perplexity paraphrase does not necessarily result in a low-perplexity prompt for the target model.
  • Figure 2: Two explored decoding schemes of MonoPara. Ensemble-based decoding (bottom) combines the token probabilities from the paraphrase model and the target model in each decoding step. Search-based decoding (top) further leverages look-ahead decoding to consider the potential future impact of current choices.
  • Figure 3: Model's average accuracy across $4$ GLUE datasets, with each dataset having six instructions with perturbation added at character, word, and semantic levels. Mono-E has consistent improvement in accuracy across all types of perturbation compared to vanilla paraphrasing.