Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng
TL;DR
This work tackles the challenge of abstract creativity in diffusion-based image synthesis by redefining 'creative' as a universal token <CreTok>, learned through a TP2O-oriented CangJie dataset. The approach enables zero-shot combinatorial generation without task-specific retraining, significantly improving text–image alignment and human-perceived creativity. By optimizing the cosine similarity between restrictive and adaptive prompts and continually refining <CreTok> over diverse text pairs, the method achieves cohesive fusion of concepts (e.g., Lettuce and Mantis) and extends to multi-concept CT2I tasks, while remaining efficient (≈4 seconds per image). Extensive evaluations, including GPT-4o and a user study, show CreTok outperforms SOTA diffusion models and existing creative-generation methods in integration, originality, and aesthetics, with broad universality across styles and prompts.
Abstract
``Creative'' remains an inherently abstract concept for both humans and diffusion models. While text-to-image (T2I) diffusion models can easily generate out-of-distribution concepts like ``a blue banana'', they struggle with generating combinatorial objects such as ``a creative mixture that resembles a lettuce and a mantis'', due to difficulties in understanding the semantic depth of ``creative''. Current methods rely heavily on synthesizing reference prompts or images to achieve a creative effect, typically requiring retraining for each unique creative output-a process that is computationally intensive and limits practical applications. To address this, we introduce CreTok, which brings meta-creativity to diffusion models by redefining ``creative'' as a new token, \texttt{<CreTok>}, thus enhancing models' semantic understanding for combinatorial creativity. CreTok achieves such redefinition by iteratively sampling diverse text pairs from our proposed CangJie dataset to form adaptive prompts and restrictive prompts, and then optimizing the similarity between their respective text embeddings. Extensive experiments demonstrate that <CreTok> enables the universal and direct generation of combinatorial creativity across diverse concepts without additional training, achieving state-of-the-art performance with improved text-image alignment and higher human preference ratings. Code will be made available at https://github.com/fu-feng/CreTok.
