Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

Yiwei Li; Fei Mi; Yitong Li; Yasheng Wang; Bin Sun; Shaoxiong Feng; Kan Li

Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

Yiwei Li, Fei Mi, Yitong Li, Yasheng Wang, Bin Sun, Shaoxiong Feng, Kan Li

TL;DR

The paper tackles the challenge of conflicting decoding requirements in open-domain dialogue, where chit-chat demands diversity while factual QA requires reliability. It introduces Dynamic Decoding Strategy (DDS), which uses a learnable diversity score to adapt the decoding space via context-sensitive temperature mappings, compatible with temperature, top-$k$, top-$p$, and locally typical sampling. DDS can operate at both sentence- and token-level granularity and can be applied during training through dynamic temperature adjustments, improving both diversity and factual accuracy across diverse datasets and models. Experiments on Chinese and multilingual data show consistent gains over strong baselines, with ablations confirming the effectiveness of diverse mapping strategies, token-level adaptations, and dynamic training for robust, generalizable dialogue generation.

Abstract

Stochastic sampling strategies such as top-k and top-p have been widely used in dialogue generation task. However, as an open-domain chatting system, there will be two different conversation scenarios, i.e. chit-chat and knowledge-based question answering. In the former situation, responses diversity is essential due to the one-to-many nature in dialogue. The latter, on the other hand, requires less randomness given that stochastic decoding strategy entails the risk of generating incorrect information. As a result, an adaptive and flexible decoding strategy is needed to cope with these two scenarios simultaneously. To this end, we propose the dynamic decoding strategy (DDS), which can adjust the decoding space w.r.t. different contexts. In DDS, both sequence-level and token-level adaptive search can be achieved to adjust the decoding process in a unified framework. Besides, our adaptive algorithm can not only be used during model inference, but it can also be applied during the model training stage to further enhance the performance. Comprehensive experiments indicate that the proposed decoding strategy can consistently improve the performance of pre-trained dialogue models when coupled with four well-used stochastic decoding algorithms.

Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

TL;DR

, top-

, and locally typical sampling. DDS can operate at both sentence- and token-level granularity and can be applied during training through dynamic temperature adjustments, improving both diversity and factual accuracy across diverse datasets and models. Experiments on Chinese and multilingual data show consistent gains over strong baselines, with ablations confirming the effectiveness of diverse mapping strategies, token-level adaptations, and dynamic training for robust, generalizable dialogue generation.

Abstract

Paper Structure (29 sections, 7 equations, 4 figures, 13 tables)

This paper contains 29 sections, 7 equations, 4 figures, 13 tables.

Introduction
Background
Dialogue Generation
Stochastic Decoding Algorithms
Temperature Sampling.
Top-$k$ Sampling
Top-p Sampling.
Locally Typical Sampling.
Methodology
Diversity Score Calculation
Diversity Score Training
Sentence-level
Token-level
Temperature Mapping Strategies
Dynamic Temperature in Training
...and 14 more sections

Figures (4)

Figure 1: An overview of the process of DDS: (a) Calculating the diversity score. (b) Training the regression head. (c) Mapping score to temperature. (d) Dynamic decoding and training.
Figure 2: Different mapping strategies to project the diversity score to temperature.
Figure 3: Similarity score distributions of LCCC (left) and LQA (right). The former is a chit-chat dataset and the latter is for QA scenario. The samples are generated by PanGu-Bot and the scores are calculated by BERTScore. Although overall scores of the chatting scene are lower, there are also some noise samples with much higher similarity scores for chitchat and lower scores for QA.
Figure 4: Token level diversity score (normalized) over generation steps.

Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

TL;DR

Abstract

Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)