Table of Contents
Fetching ...

Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms

Parham Rezaei, Farzan Farnia, Cheuk Ting Li

TL;DR

The paper investigates whether mixtures of pretrained generative models can outperform any single model on diversity and quality metrics. It introduces Mixture-UCB, a bandit-based online framework that optimizes a quadratic mixture loss L(α) via continuum-armed bandits and online gradient methods, with provable convergence guarantees. The approach is instantiated with metrics including Rényi Kernel Entropy (RKE) and Kernel Inception Distance (KID), as well as combinations with Precision/Density to balance quality and diversity. Empirical results across synthetic, image, and text generation demonstrate that learned mixtures yield higher diversity scores and competitive or improved quality relative to best-in-class single models, across multiple datasets and modalities. The work provides code and a path toward practical deployment of optimal model mixtures in real-world generative pipelines.

Abstract

The availability of multiple training algorithms and architectures for generative models requires a selection mechanism to form a single model over a group of well-trained generation models. The selection task is commonly addressed by identifying the model that maximizes an evaluation score based on the diversity and quality of the generated data. However, such a best-model identification approach overlooks the possibility that a mixture of available models can outperform each individual model. In this work, we numerically show that a mixture of generative models on benchmark image datasets can indeed achieve a better evaluation score (based on FID and KID scores), compared to the individual models. This observation motivates the development of efficient algorithms for selecting the optimal mixture of the models. To address this, we formulate a quadratic optimization problem to find an optimal mixture model achieving the maximum of kernel-based evaluation scores including kernel inception distance (KID) and Rényi kernel entropy (RKE). To identify the optimal mixture of the models using the fewest possible sample queries, we view the selection task as a multi-armed bandit (MAB) problem and propose the Mixture Upper Confidence Bound (Mixture-UCB) algorithm that provably converges to the optimal mixture of the involved models. More broadly, the proposed Mixture-UCB can be extended to optimize every convex quadratic function of the mixture weights in a general MAB setting. We prove a regret bound for the Mixture-UCB algorithm and perform several numerical experiments to show the success of Mixture-UCB in finding the optimal mixture of text and image generative models. The project code is available at https://github.com/Rezaei-Parham/Mixture-UCB.

Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms

TL;DR

The paper investigates whether mixtures of pretrained generative models can outperform any single model on diversity and quality metrics. It introduces Mixture-UCB, a bandit-based online framework that optimizes a quadratic mixture loss L(α) via continuum-armed bandits and online gradient methods, with provable convergence guarantees. The approach is instantiated with metrics including Rényi Kernel Entropy (RKE) and Kernel Inception Distance (KID), as well as combinations with Precision/Density to balance quality and diversity. Empirical results across synthetic, image, and text generation demonstrate that learned mixtures yield higher diversity scores and competitive or improved quality relative to best-in-class single models, across multiple datasets and modalities. The work provides code and a path toward practical deployment of optimal model mixtures in real-world generative pipelines.

Abstract

The availability of multiple training algorithms and architectures for generative models requires a selection mechanism to form a single model over a group of well-trained generation models. The selection task is commonly addressed by identifying the model that maximizes an evaluation score based on the diversity and quality of the generated data. However, such a best-model identification approach overlooks the possibility that a mixture of available models can outperform each individual model. In this work, we numerically show that a mixture of generative models on benchmark image datasets can indeed achieve a better evaluation score (based on FID and KID scores), compared to the individual models. This observation motivates the development of efficient algorithms for selecting the optimal mixture of the models. To address this, we formulate a quadratic optimization problem to find an optimal mixture model achieving the maximum of kernel-based evaluation scores including kernel inception distance (KID) and Rényi kernel entropy (RKE). To identify the optimal mixture of the models using the fewest possible sample queries, we view the selection task as a multi-armed bandit (MAB) problem and propose the Mixture Upper Confidence Bound (Mixture-UCB) algorithm that provably converges to the optimal mixture of the involved models. More broadly, the proposed Mixture-UCB can be extended to optimize every convex quadratic function of the mixture weights in a general MAB setting. We prove a regret bound for the Mixture-UCB algorithm and perform several numerical experiments to show the success of Mixture-UCB in finding the optimal mixture of text and image generative models. The project code is available at https://github.com/Rezaei-Parham/Mixture-UCB.

Paper Structure

This paper contains 38 sections, 3 theorems, 50 equations, 13 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Fix a probability vector $\boldsymbol{\alpha}$.Theorem thm:Lhat holds for a fixed $\boldsymbol{\alpha}$. A worst-case bound that simultaneously holds for every $\boldsymbol{\alpha}$ is in Lemma lem:Fhat_sup. Suppose we have samples $x_{i,1},\ldots,x_{i,n_{i}}$ from the distribution $P_{i}$ for $i=1, where $\boldsymbol{\epsilon}(\delta):=(\Delta_{L}\sqrt{\frac{\log(1/\delta)}{2n_{i}}}+\frac{\Delta_

Figures (13)

  • Figure 1: A mixture (right-most case) of FFHQ pre-trained generative models with weights (0.25,0.4,0.01,0.08,0.26) achieves better FID and KID scores compared to each of the five involved models. The mixture weights are computed using our proposed Mixture-UCB-OGD algorithm.
  • Figure 2: Visual comparison of the diversity across individual arms and the optimal mixture for images generated using models Kandinsky 3, Stable Diffusion 3, and PixArt-$\alpha$ with the prompt "Red bird, cartoon style". The mixture weights are computed via the Mixture-UCB-OGD method.
  • Figure 3: Performance comparison of online algorithms for the KID metric across FFHQ, LSUN-Bedroom, and FFHQ Truncated generators.
  • Figure 4: Performance comparison of online algorithms using RKE score of T2I generative models.
  • Figure 5: Performance comparison of online algorithms using the combination of RKE with Precision and RKE with Density metrics.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Lemma 1