Table of Contents
Fetching ...

EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling

Shimao Zhang, Yu Bao, Shujian Huang

TL;DR

The paper addresses the limitation of fixed temperature in LLM decoding by proposing Entropy-based Dynamic Temperature (EDT), a lightweight, single-model method that adapts the temperature at each decoding step using the entropy of the token distribution. EDT sets the step temperature as $T = T_0 \cdot \mathcal{N}^{\frac{\theta}{\text{Entropy}}}$ with $\mathcal{N}=0.8$, enabling a balance between generation quality and diversity while saving memory relative to KL-divergence based approaches. Across four benchmarks (summarization, QA, translation) and multiple metrics including ROUGE-L, BLEU, Self-BLEU, and an EDA trade-off score, EDT consistently outperforms fixed temperature and KL-based dynamic sampling, with token-level control offering the best performance. The method is implemented on top of LLaMA-2-13B with LoRA fine-tuning, achieving near-fixed-cost inference and offering practical benefits for adaptive decoding in real-world NLP tasks.

Abstract

Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method, to achieve a more balanced performance in terms of both generation quality and diversity by dynamically selecting the temperature parameter. Additionally, we also show model performance and comprehensive analyses for 4 different generation benchmarks. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.

EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling

TL;DR

The paper addresses the limitation of fixed temperature in LLM decoding by proposing Entropy-based Dynamic Temperature (EDT), a lightweight, single-model method that adapts the temperature at each decoding step using the entropy of the token distribution. EDT sets the step temperature as with , enabling a balance between generation quality and diversity while saving memory relative to KL-divergence based approaches. Across four benchmarks (summarization, QA, translation) and multiple metrics including ROUGE-L, BLEU, Self-BLEU, and an EDA trade-off score, EDT consistently outperforms fixed temperature and KL-based dynamic sampling, with token-level control offering the best performance. The method is implemented on top of LLaMA-2-13B with LoRA fine-tuning, achieving near-fixed-cost inference and offering practical benefits for adaptive decoding in real-world NLP tasks.

Abstract

Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method, to achieve a more balanced performance in terms of both generation quality and diversity by dynamically selecting the temperature parameter. Additionally, we also show model performance and comprehensive analyses for 4 different generation benchmarks. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
Paper Structure (24 sections, 11 equations, 7 figures, 6 tables)

This paper contains 24 sections, 11 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Temperature distribution for optimal generation quality score on four datasets at single instance level. The horizontal axis represents the number of instances. All experiment settings follow the same settings in Section \ref{['sec:experiments']}. This result is discussed in § \ref{['subsec:preliminary_study']}. It shows that a fixed temperature can't adequately meet our needs.
  • Figure 2: Illustration of the decoding process with our EDT. At every decoding step, the system obtains the logits first (➀) and generate the probability distribution of the next token (➁). Then based on the entropy (➂) of the initial probability distribution, the model chooses the temperature (➃), obtains the new distribution (➄), and samples the next token (➅).
  • Figure 3: XLSum
  • Figure 4: QuAC
  • Figure 5: MS MARCO v1.1
  • ...and 2 more figures