Assessing and Understanding Creativity in Large Language Models
Yunpu Zhao, Rui Zhang, Wenyi Li, Di Huang, Jiaming Guo, Shaohui Peng, Yifan Hao, Yuanbo Wen, Xing Hu, Zidong Du, Qi Guo, Ling Li, Yunji Chen
TL;DR
This work addresses the challenge of quantifying creativity in large language models by adapting the Torrance Tests of Creative Thinking (TTCT) into a scalable, automated framework. It builds a 700-question dataset across seven TTCT-inspired verbal tasks and evaluates six LLMs using four criteria: Fluency, Flexibility, Originality, and Elaboration, with GPT-4 as the scoring agent and human validation for reliability. The study shows substantial variation in creativity across models, prompt types, and role-play settings, with elaboration typically strong and originality variable; collaboration among models and alignment with personality traits can further influence creativity. The findings offer a practical, scalable methodology for AI creativity assessment and illuminate how model design, prompting, and psychometric factors shape creative output, bridging AI behavior with human cognitive theories and potential applications.
Abstract
In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy and efficiency. This paper aims to establish an efficient framework for assessing the level of creativity in LLMs. By adapting the modified Torrance Tests of Creative Thinking, the research evaluates the creative performance of various LLMs across 7 tasks, emphasizing 4 criteria including Fluency, Flexibility, Originality, and Elaboration. In this context, we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method. In addition, this study presents a novel analysis of LLMs' responses to diverse prompts and role-play situations. We found that the creativity of LLMs primarily falls short in originality, while excelling in elaboration. Besides, the use of prompts and the role-play settings of the model significantly influence creativity. Additionally, the experimental results also indicate that collaboration among multiple LLMs can enhance originality. Notably, our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity. The findings underscore the significant impact of LLM design on creativity and bridges artificial intelligence and human creativity, offering insights into LLMs' creativity and potential applications.
