Table of Contents
Fetching ...

Is Temperature the Creativity Parameter of Large Language Models?

Max Peeperkorn, Tom Kouwenhoven, Dan Brown, Anna Jordanous

TL;DR

The paper investigates whether the temperature parameter in large language models acts as the creativity driver for narrative generation. Using a fixed context and an exemplar baseline, it combines computational embedding-space analyses with human judgments across seven temperature values to assess novelty, typicality, cohesion, and coherence. Results show only a weak link between temperature and novelty, a negative relation with coherence, and little to no relation with typicality or cohesion, indicating that the creativity parameter claim is overstated. It advocates exemplar-based evaluation and outlines directions for more controllable LLM creativity, including benchmarks and decoding strategies tailored to creative tasks.

Abstract

Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter.

Is Temperature the Creativity Parameter of Large Language Models?

TL;DR

The paper investigates whether the temperature parameter in large language models acts as the creativity driver for narrative generation. Using a fixed context and an exemplar baseline, it combines computational embedding-space analyses with human judgments across seven temperature values to assess novelty, typicality, cohesion, and coherence. Results show only a weak link between temperature and novelty, a negative relation with coherence, and little to no relation with typicality or cohesion, indicating that the creativity parameter claim is overstated. It advocates exemplar-based evaluation and outlines directions for more controllable LLM creativity, including benchmarks and decoding strategies tailored to creative tasks.

Abstract

Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter.
Paper Structure (28 sections, 1 equation, 2 figures, 2 tables)

This paper contains 28 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: In (a), we see that the effect temperature has on the perplexity, the quality according to the LLM, of the output. Both, (b) and (c) suggest that higher temperatures do not imply more diversity on the semantic or lexical level, or that it further extends the range of possible outputs in the current context.
  • Figure 2: Here, we show the pca projections of 100 stories per temperature generated using Llama 2-Chat 70B and \ref{['fig:exact-llama-2-prompt']}. While we observe that an increasing temperature seems to explore a larger region of the embedding space with a small number of samples, it does not imply that we access a broader slice of the model's probability distribution. We merely increase the chance of generating more diversity. $\bigstar$ denotes the exemplar story.