Table of Contents
Fetching ...

Steering Large Language Models to Evaluate and Amplify Creativity

Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Shao-yen Tseng, Vasudev Lal

TL;DR

The paper addresses the problem that Large Language Models (LLMs) are poor judges of creativity and proposes a mechanistic, activation-space approach to both evaluate and amplify creativity. A creativity attribute $a$ is derived from a contrast between creative and uncreative prompts, and an inference-time score is computed by cosine similarity to $a$; creativity is additionally amplified by injecting $\lambda a$ into a chosen layer, with $l=8$ and $\lambda=3$. The authors demonstrate that this attribute correlates with human judgments and that steering improves the creativity of generated text on a creative-writing dataset (Fan2018Hierarchical), using Llama3-8B and a larger frontier model (Llama3-70B) for evaluation. Key findings include strong alignment between the proposed creativity score and human judgments, and that naive self-evaluation by the base model is insufficient compared to larger models for predicting creativity. The work contributes a practical, inference-time method for both measuring and boosting creativity in generative text and extends activation-space steering literature to creative domains, with potential implications for structured creative writing and data-efficient self-evaluation.

Abstract

Although capable of generating creative text, Large Language Models (LLMs) are poor judges of what constitutes "creativity". In this work, we show that we can leverage this knowledge of how to write creatively in order to better judge what is creative. We take a mechanistic approach that extracts differences in the internal states of an LLM when prompted to respond "boringly" or "creatively" to provide a robust measure of creativity that corresponds strongly with human judgment. We also show these internal state differences can be applied to enhance the creativity of generated text at inference time.

Steering Large Language Models to Evaluate and Amplify Creativity

TL;DR

The paper addresses the problem that Large Language Models (LLMs) are poor judges of creativity and proposes a mechanistic, activation-space approach to both evaluate and amplify creativity. A creativity attribute is derived from a contrast between creative and uncreative prompts, and an inference-time score is computed by cosine similarity to ; creativity is additionally amplified by injecting into a chosen layer, with and . The authors demonstrate that this attribute correlates with human judgments and that steering improves the creativity of generated text on a creative-writing dataset (Fan2018Hierarchical), using Llama3-8B and a larger frontier model (Llama3-70B) for evaluation. Key findings include strong alignment between the proposed creativity score and human judgments, and that naive self-evaluation by the base model is insufficient compared to larger models for predicting creativity. The work contributes a practical, inference-time method for both measuring and boosting creativity in generative text and extends activation-space steering literature to creative domains, with potential implications for structured creative writing and data-efficient self-evaluation.

Abstract

Although capable of generating creative text, Large Language Models (LLMs) are poor judges of what constitutes "creativity". In this work, we show that we can leverage this knowledge of how to write creatively in order to better judge what is creative. We take a mechanistic approach that extracts differences in the internal states of an LLM when prompted to respond "boringly" or "creatively" to provide a robust measure of creativity that corresponds strongly with human judgment. We also show these internal state differences can be applied to enhance the creativity of generated text at inference time.

Paper Structure

This paper contains 7 sections, 2 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Left: We prompted Llama3-8B and Llama3-70B to assign a rating (0-9) to generated stories with respect to a baseline prompt with no creative intervention (Baseline), a baseline prompt with creative intervention (Induced), and a creative version of the baseline prompt (Creative). Error bars are $95\%$ confidence intervals. Center: We compute the cosine similarity to the creative attribute during text generation on the test split of the uncreative prompts, uncreative prompts with creativity added, and the creative prompts. We find the intermediate activations of the latter two are much more similar to the creativity attribute. Right: Self, Frontier Model, and Human assessment accuracy of predicting which completion is more creative, given a pair of induced and baseline generations-- finding the Llama3-8B model to be a poor self-judge of creativity.