Steering Large Language Models to Evaluate and Amplify Creativity
Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Shao-yen Tseng, Vasudev Lal
TL;DR
The paper addresses the problem that Large Language Models (LLMs) are poor judges of creativity and proposes a mechanistic, activation-space approach to both evaluate and amplify creativity. A creativity attribute $a$ is derived from a contrast between creative and uncreative prompts, and an inference-time score is computed by cosine similarity to $a$; creativity is additionally amplified by injecting $\lambda a$ into a chosen layer, with $l=8$ and $\lambda=3$. The authors demonstrate that this attribute correlates with human judgments and that steering improves the creativity of generated text on a creative-writing dataset (Fan2018Hierarchical), using Llama3-8B and a larger frontier model (Llama3-70B) for evaluation. Key findings include strong alignment between the proposed creativity score and human judgments, and that naive self-evaluation by the base model is insufficient compared to larger models for predicting creativity. The work contributes a practical, inference-time method for both measuring and boosting creativity in generative text and extends activation-space steering literature to creative domains, with potential implications for structured creative writing and data-efficient self-evaluation.
Abstract
Although capable of generating creative text, Large Language Models (LLMs) are poor judges of what constitutes "creativity". In this work, we show that we can leverage this knowledge of how to write creatively in order to better judge what is creative. We take a mechanistic approach that extracts differences in the internal states of an LLM when prompted to respond "boringly" or "creatively" to provide a robust measure of creativity that corresponds strongly with human judgment. We also show these internal state differences can be applied to enhance the creativity of generated text at inference time.
