Table of Contents
Fetching ...

CREward: A Type-Specific Creativity Reward Model

Jiyeon Han, Ali Mahdavi-Amiri, Hao Zhang, Haedong Jeong

TL;DR

CREward formalizes creativity along three interpretable axes—geometry, material, and texture—and uses a human LVLM-aligned framework to produce a type-specific reward for guiding and evaluating creative image generation. By building CREBench, correlating human judgments with open and closed LVLMs, and training a lightweight reward head with LVLM-derived labels, the approach achieves strong alignment with human perception across all creativity types. CREward enables practical applications including type-aware creativity assessment, sampling for design inspiration, and controllable diffusion-generation via LoRA sliders, with Grad-CAM providing explainable highlights of creativity-relevant regions. While promising, the authors note limitations such as bias toward novelty and entanglement among creativity types, suggesting future work on disentanglement and joint value assessment to complement novelty-focused signals. Overall, the work offers a scalable, interpretable framework for evaluating and steering creativity in generative visual systems with real-world design implications.

Abstract

Creativity is a complex phenomenon. When it comes to representing and assessing creativity, treating it as a single undifferentiated quantity would appear naive and underwhelming. In this work, we learn the \emph{first type-specific creativity reward model}, coined CREward, which spans three creativity ``axes," geometry, material, and texture, to allow us to view creativity through the lens of the image formation pipeline. To build our reward model, we first conduct a human benchmark evaluation to capture human perception of creativity for each type across various creative images. We then analyze the correlation between human judgments and predictions by large vision-language models (LVLMs), confirming that LVLMs exhibit strong alignment with human perception. Building on this observation, we collect LVLM-generated labels to train our CREward model that is applicable to both evaluation and generation of creative images. We explore three applications of CREward: creativity assessment, explainable creativity, and creative sample acquisition for both human design inspiration and guiding creative generation through low-rank adaptation.

CREward: A Type-Specific Creativity Reward Model

TL;DR

CREward formalizes creativity along three interpretable axes—geometry, material, and texture—and uses a human LVLM-aligned framework to produce a type-specific reward for guiding and evaluating creative image generation. By building CREBench, correlating human judgments with open and closed LVLMs, and training a lightweight reward head with LVLM-derived labels, the approach achieves strong alignment with human perception across all creativity types. CREward enables practical applications including type-aware creativity assessment, sampling for design inspiration, and controllable diffusion-generation via LoRA sliders, with Grad-CAM providing explainable highlights of creativity-relevant regions. While promising, the authors note limitations such as bias toward novelty and entanglement among creativity types, suggesting future work on disentanglement and joint value assessment to complement novelty-focused signals. Overall, the work offers a scalable, interpretable framework for evaluating and steering creativity in generative visual systems with real-world design implications.

Abstract

Creativity is a complex phenomenon. When it comes to representing and assessing creativity, treating it as a single undifferentiated quantity would appear naive and underwhelming. In this work, we learn the \emph{first type-specific creativity reward model}, coined CREward, which spans three creativity ``axes," geometry, material, and texture, to allow us to view creativity through the lens of the image formation pipeline. To build our reward model, we first conduct a human benchmark evaluation to capture human perception of creativity for each type across various creative images. We then analyze the correlation between human judgments and predictions by large vision-language models (LVLMs), confirming that LVLMs exhibit strong alignment with human perception. Building on this observation, we collect LVLM-generated labels to train our CREward model that is applicable to both evaluation and generation of creative images. We explore three applications of CREward: creativity assessment, explainable creativity, and creative sample acquisition for both human design inspiration and guiding creative generation through low-rank adaptation.

Paper Structure

This paper contains 33 sections, 6 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Three types of creativity grounded in 3D rendering.
  • Figure 2: Overview of CreBench, CREward, and their applications. (a) LLM-driven prompts and various T2I models are used to collect creative generations. (b) Pairwise rankings are collected on randomly sampled image pairs along four types—geometry, material, texture, and overall—using either human annotators or a LVLM; together these comprise CreBench. (c) To distill LVLM judgments, we train a lightweight regressor (frozen vision backbone + MLP head) to predict type-wise scores, yielding CREward. (d) Applications enabled by CREward: Creativity Assessment, creativity explanation, and creativity sample acquisition.
  • Figure 3: Winning rates ($\uparrow$) derived from preference labels on the benchmark dataset for human evaluation, LLM (Gemini-2.5), and our CREward.
  • Figure 4: Violin plots of reward distributions for creative generations from various representative diffusion models.
  • Figure 5: Top/Bottom 5 ranked generations for each creativity type from 100 LLM-generated (type-agnostic) creative prompts .
  • ...and 13 more figures