Table of Contents
Fetching ...

Impact of Decoding Methods on Human Alignment of Conversational LLMs

Shaz Furniturewala, Kokil Jaidka, Yashvardhan Sharma

TL;DR

The paper investigates how decoding strategies shape alignment between conversational LLM outputs and human interactions. It introduces two synthetic turn-by-turn corpora and novel metrics capturing substance, style, and psychometric alignment. The results show that a small number of beams combined with low nucleus-sampling probability improves human-likeness, while Top-K has limited effect and longer dialogues can enhance alignment on some datasets. The findings highlight the importance of decoding choices and interaction context for deploying more natural and engaging chatbots.

Abstract

To be included into chatbot systems, Large language models (LLMs) must be aligned with human conversational conventions. However, being trained mainly on web-scraped data gives existing LLMs a voice closer to informational text than actual human speech. In this paper, we examine the effect of decoding methods on the alignment between LLM-generated and human conversations, including Beam Search, Top K Sampling, and Nucleus Sampling. We present new measures of alignment in substance, style, and psychometric orientation, and experiment with two conversation datasets. Our results provide subtle insights: better alignment is attributed to fewer beams in Beam Search and lower values of P in Nucleus Sampling. We also find that task-oriented and open-ended datasets perform differently in terms of alignment, indicating the significance of taking into account the context of the interaction.

Impact of Decoding Methods on Human Alignment of Conversational LLMs

TL;DR

The paper investigates how decoding strategies shape alignment between conversational LLM outputs and human interactions. It introduces two synthetic turn-by-turn corpora and novel metrics capturing substance, style, and psychometric alignment. The results show that a small number of beams combined with low nucleus-sampling probability improves human-likeness, while Top-K has limited effect and longer dialogues can enhance alignment on some datasets. The findings highlight the importance of decoding choices and interaction context for deploying more natural and engaging chatbots.

Abstract

To be included into chatbot systems, Large language models (LLMs) must be aligned with human conversational conventions. However, being trained mainly on web-scraped data gives existing LLMs a voice closer to informational text than actual human speech. In this paper, we examine the effect of decoding methods on the alignment between LLM-generated and human conversations, including Beam Search, Top K Sampling, and Nucleus Sampling. We present new measures of alignment in substance, style, and psychometric orientation, and experiment with two conversation datasets. Our results provide subtle insights: better alignment is attributed to fewer beams in Beam Search and lower values of P in Nucleus Sampling. We also find that task-oriented and open-ended datasets perform differently in terms of alignment, indicating the significance of taking into account the context of the interaction.
Paper Structure (16 sections, 2 figures, 2 tables)

This paper contains 16 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Turn-based scores for each decoding parameter, averaged across all perturbations and both models (Llama 2 and Llama 3).
  • Figure 2: The parameters effecting significant positive and negative changes in style, psychometrics and semantics of LLM conversations. Calculated using multi-level models controlling for model and dataset differences.