A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans
Anca Dinu, Andra-Maria Florescu, Alina Resceanu
TL;DR
This paper presents a general linguistic creativity test for humans and Large Language Models (LLMs) focusing on word formation and metaphor use. Responses are scored along Originality, Elaboration, and Flexibility using Open Creativity Scoring with AI (OCSAI) across 24 humans and 24 open-source LLMs, yielding 2304 answers. Results show LLMs outperform humans in total creativity and in Originality and Flexibility, with notable exceptions and higher variance among LLMs, and a distinction that humans lean toward E-creativity while LLMs lean toward F-creativity. These findings imply LLMs can generate unseen terms and contextually appropriate expressions, with implications for knowledge engineering and adaptive vocabularies, while highlighting diverse human creative strategies and the need for caution in deployment.
Abstract
The following paper introduces a general linguistic creativity test for humans and Large Language Models (LLMs). The test consists of various tasks aimed at assessing their ability to generate new original words and phrases based on word formation processes (derivation and compounding) and on metaphorical language use. We administered the test to 24 humans and to an equal number of LLMs, and we automatically evaluated their answers using OCSAI tool for three criteria: Originality, Elaboration, and Flexibility. The results show that LLMs not only outperformed humans in all the assessed criteria, but did better in six out of the eight test tasks. We then computed the uniqueness of the individual answers, which showed some minor differences between humans and LLMs. Finally, we performed a short manual analysis of the dataset, which revealed that humans are more inclined towards E(extending)-creativity, while LLMs favor F(ixed)-creativity.
