Vibe Spaces for Creatively Connecting and Expressing Visual Concepts
Huzheng Yang, Katherine Xu, Andrew Lu, Michael D. Grossberg, Yutong Bai, Jianbo Shi
TL;DR
This paper tackles creative visual blending by identifying and merging the most relevant shared attributes—'vibes'—between images. It introduces Vibe Space, a hierarchical graph manifold learned on a multiscale diffusion framework to produce non-linear geodesics in ambient feature spaces like CLIP, enabling coherent Vibe Blending and Vibe Analogy. The authors develop a cognitively inspired evaluation framework combining human judgments, LLM reasoning, and a path nonlinearity score (PNS) to measure blend creativity and difficulty, demonstrating superior creativity and coherence over strong baselines on challenging pairs. They also propose mechanisms for creative control, extrapolation, and negative vibe suppression, offering practical tools for controllable, image-conditioned creative synthesis with efficient training and inference.
Abstract
Creating new visual concepts often requires connecting distinct ideas through their most relevant shared attributes -- their vibe. We introduce Vibe Blending, a novel task for generating coherent and meaningful hybrids that reveals these shared attributes between images. Achieving such blends is challenging for current methods, which struggle to identify and traverse nonlinear paths linking distant concepts in latent space. We propose Vibe Space, a hierarchical graph manifold that learns low-dimensional geodesics in feature spaces like CLIP, enabling smooth and semantically consistent transitions between concepts. To evaluate creative quality, we design a cognitively inspired framework combining human judgments, LLM reasoning, and a geometric path-based difficulty score. We find that Vibe Space produces blends that humans consistently rate as more creative and coherent than current methods.
