Table of Contents
Fetching ...

Combinatorial Creativity: A New Frontier in Generalization Abilities

Samuel Schapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney

TL;DR

The paper defines combinatorial creativity (CC) as an open-ended generalization and proposes a graph-based framework to evaluate outputs by continuous novelty $N$ and utility $U$. It formalizes CC in a labeled-concept space, with artifacts modeled as labeled walks and prompts encoded via inclusion/exclusion constraints, enabling a measurable creativity score $\mathcal{C}(\theta)=\mathbb{E}[U\cdot N]$. Through large-scale experiments on decoder-only transformers across $\sim$1M, 10M, and 100M parameter regimes, it shows non-monotonic, scale- and architecture-dependent optimal depths and widths for creativity, and reveals a persistent novelty-utility tradeoff that does not vanish with scale. The ideation-execution gap is interpreted as a fundamental consequence of this tradeoff, with practical implications for improving AI creativity via architectural design, constraint modeling, and inference-time techniques. Overall, the work provides a foundational framework and empirical insights for understanding and advancing CC in modern AI.

Abstract

Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, bridging the gap between human and machine intelligence.

Combinatorial Creativity: A New Frontier in Generalization Abilities

TL;DR

The paper defines combinatorial creativity (CC) as an open-ended generalization and proposes a graph-based framework to evaluate outputs by continuous novelty and utility . It formalizes CC in a labeled-concept space, with artifacts modeled as labeled walks and prompts encoded via inclusion/exclusion constraints, enabling a measurable creativity score . Through large-scale experiments on decoder-only transformers across 1M, 10M, and 100M parameter regimes, it shows non-monotonic, scale- and architecture-dependent optimal depths and widths for creativity, and reveals a persistent novelty-utility tradeoff that does not vanish with scale. The ideation-execution gap is interpreted as a fundamental consequence of this tradeoff, with practical implications for improving AI creativity via architectural design, constraint modeling, and inference-time techniques. Overall, the work provides a foundational framework and empirical insights for understanding and advancing CC in modern AI.

Abstract

Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, bridging the gap between human and machine intelligence.

Paper Structure

This paper contains 48 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Combinatorial creativity and cognitive associations. Since the seminal work of associative_basis, creative ability among humans has long been associated with richer associative hierarchies creativity_in_science believed to enable the realization of combinations of distant representations creative_comb_repsimonton_discovery_invention_as_combinatorialkoestler_creation that leads to breakthrough discovery.
  • Figure 2: An open-ended, algorithmic framework for evaluating combinatorial creativity (CC) abilities. A model is pre-trained on concept-relation-concept triples drawn from an underlying conceptual space. At test-time, creative prompts ask the model to generate "ideas" between distant start and end concepts while adhering to increasing levels of inclusion-exclusion, logical constraints. Idea generation is done fully in-weights, not in-context, since CC involves recalling facts in-memory.
  • Figure 3: The impact of width and depth on creativity. These heatmaps visualize the combinatorial creativity of models across three distinct parameter budgets (1M, 10M, and 100M). For each budget, the vertical axis represents the amount of training compute in FLOPs. The color intensity corresponds to the model's creativity score, while the horizontal axis represents the number of layers $L$ (\ref{['fig:impact_depth']}) or the width to depth ratio $E/L$ (\ref{['fig:impact_width']}). The contours reveal a clear, non-monotonic trend: in \ref{['fig:impact_depth']}, creativity improves as layers are added up to a certain point, after which performance declines, and in \ref{['fig:impact_width']}, creativity improves as the width is increased up to a certain point, after which performance also declines. The optimal depth becomes more pronounced at larger scales, with the 100M models achieving peak creativity around 8 layers, while the optimal performance for width is at an $E/L$ ratio between 200 and 300.
  • Figure 4: The novelty-utility tradeoff persists across scales: These plots show the relationship between the number of utility constraints (x-axis) and the normalized novelty of generated creative artifacts (y-axis) for models of three different parameter scales: 1M, 10M, and 100M. Novelty is normalized by the mean novelty of simple, single-hop paths at each constraint level to isolate the effect of complexity. A clear downward trend is visible across all scales, indicating that as more utility constraints are imposed, the novelty of the generated artifacts tends to decrease.
  • Figure 5: The distribution of error types on the combinatorial creativity task. This plot shows the proportion of error types among the creative artifacts that failed to satisfy the utility predicate (term 3 in \ref{['def:creative_utility']}), plotted on a log-scale.

Theorems & Definitions (6)

  • Definition 1: Conceptual Space
  • Definition 2: Creative Artifact
  • Definition 3: Creative Prompt
  • Definition 4: Novelty
  • Definition 5: Utility
  • Definition 6: Creativity