Table of Contents
Fetching ...

A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans

Anca Dinu, Andra-Maria Florescu, Alina Resceanu

TL;DR

This paper presents a general linguistic creativity test for humans and Large Language Models (LLMs) focusing on word formation and metaphor use. Responses are scored along Originality, Elaboration, and Flexibility using Open Creativity Scoring with AI (OCSAI) across 24 humans and 24 open-source LLMs, yielding 2304 answers. Results show LLMs outperform humans in total creativity and in Originality and Flexibility, with notable exceptions and higher variance among LLMs, and a distinction that humans lean toward E-creativity while LLMs lean toward F-creativity. These findings imply LLMs can generate unseen terms and contextually appropriate expressions, with implications for knowledge engineering and adaptive vocabularies, while highlighting diverse human creative strategies and the need for caution in deployment.

Abstract

The following paper introduces a general linguistic creativity test for humans and Large Language Models (LLMs). The test consists of various tasks aimed at assessing their ability to generate new original words and phrases based on word formation processes (derivation and compounding) and on metaphorical language use. We administered the test to 24 humans and to an equal number of LLMs, and we automatically evaluated their answers using OCSAI tool for three criteria: Originality, Elaboration, and Flexibility. The results show that LLMs not only outperformed humans in all the assessed criteria, but did better in six out of the eight test tasks. We then computed the uniqueness of the individual answers, which showed some minor differences between humans and LLMs. Finally, we performed a short manual analysis of the dataset, which revealed that humans are more inclined towards E(extending)-creativity, while LLMs favor F(ixed)-creativity.

A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans

TL;DR

This paper presents a general linguistic creativity test for humans and Large Language Models (LLMs) focusing on word formation and metaphor use. Responses are scored along Originality, Elaboration, and Flexibility using Open Creativity Scoring with AI (OCSAI) across 24 humans and 24 open-source LLMs, yielding 2304 answers. Results show LLMs outperform humans in total creativity and in Originality and Flexibility, with notable exceptions and higher variance among LLMs, and a distinction that humans lean toward E-creativity while LLMs lean toward F-creativity. These findings imply LLMs can generate unseen terms and contextually appropriate expressions, with implications for knowledge engineering and adaptive vocabularies, while highlighting diverse human creative strategies and the need for caution in deployment.

Abstract

The following paper introduces a general linguistic creativity test for humans and Large Language Models (LLMs). The test consists of various tasks aimed at assessing their ability to generate new original words and phrases based on word formation processes (derivation and compounding) and on metaphorical language use. We administered the test to 24 humans and to an equal number of LLMs, and we automatically evaluated their answers using OCSAI tool for three criteria: Originality, Elaboration, and Flexibility. The results show that LLMs not only outperformed humans in all the assessed criteria, but did better in six out of the eight test tasks. We then computed the uniqueness of the individual answers, which showed some minor differences between humans and LLMs. Finally, we performed a short manual analysis of the dataset, which revealed that humans are more inclined towards E(extending)-creativity, while LLMs favor F(ixed)-creativity.

Paper Structure

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Humans’ versus LLMs’ mean scores per criterion
  • Figure 2: Humans’ versus LLMs’ mean scores per task
  • Figure 3: Statistics for Humans’ versus LLMs’ mean scores per criterion
  • Figure 4: Top uniqueness of the dataset