Combinatorial Creativity: A New Frontier in Generalization Abilities
Samuel Schapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney
TL;DR
The paper defines combinatorial creativity (CC) as an open-ended generalization and proposes a graph-based framework to evaluate outputs by continuous novelty $N$ and utility $U$. It formalizes CC in a labeled-concept space, with artifacts modeled as labeled walks and prompts encoded via inclusion/exclusion constraints, enabling a measurable creativity score $\mathcal{C}(\theta)=\mathbb{E}[U\cdot N]$. Through large-scale experiments on decoder-only transformers across $\sim$1M, 10M, and 100M parameter regimes, it shows non-monotonic, scale- and architecture-dependent optimal depths and widths for creativity, and reveals a persistent novelty-utility tradeoff that does not vanish with scale. The ideation-execution gap is interpreted as a fundamental consequence of this tradeoff, with practical implications for improving AI creativity via architectural design, constraint modeling, and inference-time techniques. Overall, the work provides a foundational framework and empirical insights for understanding and advancing CC in modern AI.
Abstract
Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, bridging the gap between human and machine intelligence.
