Table of Contents
Fetching ...

Financial Models in Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image Diffusion

Divya Kothandaraman, Ming Lin, Dinesh Manocha

TL;DR

The paper addresses concept blending in pretrained text-to-image diffusion models by automatically selecting the most informative prompt at each denoising step through a Black-Scholes-inspired scoring mechanism. It maps diffusion dynamics to option pricing, using CLIP-based scores as asset prices and a fixed strike price to guide prompt conditioning, enabling data-efficient, training-free blending. The method outperforms baselines such as linear interpolation, alternating sampling, step-wise prompt switching, and CLIP-guided prompt selection in both qualitative and quantitative measures, across diverse prompts and scenes. This cross-domain approach demonstrates that financial probabilistic tools can enhance compositional synthesis in generative AI and suggests avenues for broader application and extension to other diffusion-based tasks.

Abstract

We introduce a novel approach for concept blending in pretrained text-to-image diffusion models, aiming to generate images at the intersection of multiple text prompts. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. Central to our method is the unique analogy between diffusion models, which are rooted in non-equilibrium thermodynamics, and the Black-Scholes model for financial option pricing. By drawing parallels between key variables in both domains, we derive a robust algorithm for concept blending that capitalizes on the Markovian dynamics of the Black-Scholes framework. Our text-based concept blending algorithm is data-efficient, meaning it does not need additional training. Furthermore, it operates without human intervention or hyperparameter tuning. We highlight the benefits of our approach by comparing it qualitatively and quantitatively to other text based concept blending techniques, including linear interpolation, alternating prompts, step-wise prompt switching, and CLIP-guided prompt selection across various scenarios such as single object per text prompt, multiple objects per text prompt and objects against backgrounds. Our work shows that financially inspired techniques can enhance text-to-image concept blending in generative AI, paving the way for broader innovation. Code is available at https://github.com/divyakraman/BlackScholesDiffusion2024.

Financial Models in Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image Diffusion

TL;DR

The paper addresses concept blending in pretrained text-to-image diffusion models by automatically selecting the most informative prompt at each denoising step through a Black-Scholes-inspired scoring mechanism. It maps diffusion dynamics to option pricing, using CLIP-based scores as asset prices and a fixed strike price to guide prompt conditioning, enabling data-efficient, training-free blending. The method outperforms baselines such as linear interpolation, alternating sampling, step-wise prompt switching, and CLIP-guided prompt selection in both qualitative and quantitative measures, across diverse prompts and scenes. This cross-domain approach demonstrates that financial probabilistic tools can enhance compositional synthesis in generative AI and suggests avenues for broader application and extension to other diffusion-based tasks.

Abstract

We introduce a novel approach for concept blending in pretrained text-to-image diffusion models, aiming to generate images at the intersection of multiple text prompts. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. Central to our method is the unique analogy between diffusion models, which are rooted in non-equilibrium thermodynamics, and the Black-Scholes model for financial option pricing. By drawing parallels between key variables in both domains, we derive a robust algorithm for concept blending that capitalizes on the Markovian dynamics of the Black-Scholes framework. Our text-based concept blending algorithm is data-efficient, meaning it does not need additional training. Furthermore, it operates without human intervention or hyperparameter tuning. We highlight the benefits of our approach by comparing it qualitatively and quantitatively to other text based concept blending techniques, including linear interpolation, alternating prompts, step-wise prompt switching, and CLIP-guided prompt selection across various scenarios such as single object per text prompt, multiple objects per text prompt and objects against backgrounds. Our work shows that financially inspired techniques can enhance text-to-image concept blending in generative AI, paving the way for broader innovation. Code is available at https://github.com/divyakraman/BlackScholesDiffusion2024.
Paper Structure (37 sections, 8 equations, 9 figures, 1 table)

This paper contains 37 sections, 8 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Our method’s results ( Black Scholes, last column) are presented alongside comparisons to prior work. Vanilla stable diffusion (SD) struggles to capture clear characteristics of individual text prompts (notably missing distinct features such as those of the parrot, cat, dog/penguin, and sunset/penguin). Linear interpolation performs poorly due to non-linear manifolds. Alternating sampling and step-wise switching yield low-quality results with artifacts, primarily because they lack intelligent prompt selection during denoising steps (missing characteristics of pizza, artifacts in cat/muffin and dog/penguin mixing, sunset/northern lights not well captured). CLIP-min exhibits bias issues by not modeling diffusion denoising dynamics and prompt selection effectively, which hinders fore-sighted decision making, the generated images are biased towards one of the text prompts. In contrast, our Black-Scholes model generates realistic images that meticulously balance and preserve the characteristics of each individual text prompt. The images are from set 1, set 2, set 3 and set 4 (Refer Section \ref{['sec:data']}) respectively.
  • Figure 2: We present more results of our method along with comparisons. Vanilla SD fails to capture clear characteristics of individual text prompts, omitting distinct features such as those related to avocado/raccoon, muffin, parrot, and oil painting style. Linear interpolation generates images not consistent with the prompts, due to issues with non-linear manifolds. CLIP Min. generates images biased towards one of the prompts. Alt. and Step prompt selection methods suffer from artifacts and are not very successful in blending the characteristics of objects corresponding to the individual text prompts - the avocado/raccoon, muffin/dog are not blended well. In the parrot/dog image, the characteristics of the parrot are missing. Alt. generates artifacts in the Times Square/ oil painting image, while Times Square is not characterized well in the image generated by Step. In contrast, the Black-Scholes model adeptly overcomes these limitations, generating realistic images that meticulously balance and preserve the unique characteristics of each individual text prompt. The images are from set 1, set 2, set 3 and set 4 (Refer Section \ref{['sec:data']}) respectively.
  • Figure 3: Our method can be extended to concept blending involving more than 2 prompts (3 in this case), as shown above.
  • Figure 4: Image variations from our method using the Black Scholes model, starting from different random Gaussian noise initializations.
  • Figure 5: Ablation experiment on the strike price $K$ reveals that the results are optimal at values of $K$ close to the CLIP score of the image generated in the vanilla case using a combination of the constituent text prompts.
  • ...and 4 more figures