Financial Models in Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image Diffusion
Divya Kothandaraman, Ming Lin, Dinesh Manocha
TL;DR
The paper addresses concept blending in pretrained text-to-image diffusion models by automatically selecting the most informative prompt at each denoising step through a Black-Scholes-inspired scoring mechanism. It maps diffusion dynamics to option pricing, using CLIP-based scores as asset prices and a fixed strike price to guide prompt conditioning, enabling data-efficient, training-free blending. The method outperforms baselines such as linear interpolation, alternating sampling, step-wise prompt switching, and CLIP-guided prompt selection in both qualitative and quantitative measures, across diverse prompts and scenes. This cross-domain approach demonstrates that financial probabilistic tools can enhance compositional synthesis in generative AI and suggests avenues for broader application and extension to other diffusion-based tasks.
Abstract
We introduce a novel approach for concept blending in pretrained text-to-image diffusion models, aiming to generate images at the intersection of multiple text prompts. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. Central to our method is the unique analogy between diffusion models, which are rooted in non-equilibrium thermodynamics, and the Black-Scholes model for financial option pricing. By drawing parallels between key variables in both domains, we derive a robust algorithm for concept blending that capitalizes on the Markovian dynamics of the Black-Scholes framework. Our text-based concept blending algorithm is data-efficient, meaning it does not need additional training. Furthermore, it operates without human intervention or hyperparameter tuning. We highlight the benefits of our approach by comparing it qualitatively and quantitatively to other text based concept blending techniques, including linear interpolation, alternating prompts, step-wise prompt switching, and CLIP-guided prompt selection across various scenarios such as single object per text prompt, multiple objects per text prompt and objects against backgrounds. Our work shows that financially inspired techniques can enhance text-to-image concept blending in generative AI, paving the way for broader innovation. Code is available at https://github.com/divyakraman/BlackScholesDiffusion2024.
