Table of Contents
Fetching ...

A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

Xiaoyan Hu, Ho-fung Leung, Farzan Farnia

TL;DR

The paper addresses the high cost of offline evaluation for deep generative models by formulating online model selection as a multi-armed bandit problem. It develops two algorithms, FD-UCB and IS-UCB, that provide data-dependent, optimistic bounds for Fréchet Distance and Inception Score, respectively, and proves sub-linear regret guarantees. Empirical results across CIFAR-10, ImageNet, FFHQ, and AFHQ demonstrate that online, sample-efficient evaluation can accurately identify top-performing models while reducing query costs. The work enables practical, cost-effective ranking of generative models and offers a foundation for extending to other modalities and contextual settings.

Abstract

Existing frameworks for evaluating and comparing generative models consider an offline setting, where the evaluator has access to large batches of data produced by the models. However, in practical scenarios, the goal is often to identify and select the best model using the fewest possible generated samples to minimize the costs of querying data from the sub-optimal models. In this work, we propose an online evaluation and selection framework to find the generative model that maximizes a standard assessment score among a group of available models. We view the task as a multi-armed bandit (MAB) and propose upper confidence bound (UCB) bandit algorithms to identify the model producing data with the best evaluation score that quantifies the quality and diversity of generated data. Specifically, we develop the MAB-based selection of generative models considering the Fréchet Distance (FD) and Inception Score (IS) metrics, resulting in the FD-UCB and IS-UCB algorithms. We prove regret bounds for these algorithms and present numerical results on standard image datasets. Our empirical results suggest the efficacy of MAB approaches for the sample-efficient evaluation and selection of deep generative models. The project code is available at https://github.com/yannxiaoyanhu/dgm-online-eval.

A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

TL;DR

The paper addresses the high cost of offline evaluation for deep generative models by formulating online model selection as a multi-armed bandit problem. It develops two algorithms, FD-UCB and IS-UCB, that provide data-dependent, optimistic bounds for Fréchet Distance and Inception Score, respectively, and proves sub-linear regret guarantees. Empirical results across CIFAR-10, ImageNet, FFHQ, and AFHQ demonstrate that online, sample-efficient evaluation can accurately identify top-performing models while reducing query costs. The work enables practical, cost-effective ranking of generative models and offers a foundation for extending to other modalities and contextual settings.

Abstract

Existing frameworks for evaluating and comparing generative models consider an offline setting, where the evaluator has access to large batches of data produced by the models. However, in practical scenarios, the goal is often to identify and select the best model using the fewest possible generated samples to minimize the costs of querying data from the sub-optimal models. In this work, we propose an online evaluation and selection framework to find the generative model that maximizes a standard assessment score among a group of available models. We view the task as a multi-armed bandit (MAB) and propose upper confidence bound (UCB) bandit algorithms to identify the model producing data with the best evaluation score that quantifies the quality and diversity of generated data. Specifically, we develop the MAB-based selection of generative models considering the Fréchet Distance (FD) and Inception Score (IS) metrics, resulting in the FD-UCB and IS-UCB algorithms. We prove regret bounds for these algorithms and present numerical results on standard image datasets. Our empirical results suggest the efficacy of MAB approaches for the sample-efficient evaluation and selection of deep generative models. The project code is available at https://github.com/yannxiaoyanhu/dgm-online-eval.
Paper Structure (29 sections, 18 theorems, 85 equations, 15 figures, 3 tables, 2 algorithms)

This paper contains 29 sections, 18 theorems, 85 equations, 15 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Assume for any generator $g$, the (random) embedding $f(X_g) \sim {\mathcal{N}}(\mu_g,\Sigma_g)$ follows a multivariate Gaussian, and the covariance matrix $\Sigma_\textup{r}$ of the real data is positive definite. Then, with probability at least $1-\delta$, we have for any $n\ge 4\bm{r}(\Sigma_g)+\log(3/\delta)$, where $\bm{r}(\Sigma_g):=\frac{\textup{Tr}[\Sigma_g]}{\|\Sigma_g\|_2}$ is the effec

Figures (15)

  • Figure 1: FID-based evaluation and selection among CIFAR10 generative models: The standard offline evaluation requires a large batch of data from every model. In contrast, our proposed online approach leverages the UCB multi-armed bandit strategy to identify the best model with fewer generations from the suboptimal models.
  • Figure 2: Online FD-based evaluation and selection among standard generative models: The $x$-axis is the number of online steps. At each step, the algorithm samples a batch of five generated images from the chosen model. The image data embeddings are extracted by CLIP cherti2023reproducible. Results are averaged over 20 trials.
  • Figure 3: Online IS-based evaluation and selection among standard generative models: The $x$-axis is the number of steps. At each step, the algorithm samples a batch of five generated images from the chosen model. Results are averaged over 20 trials.
  • Figure 4: Online IS-based evaluation and selection among variance-controlled FFHQ models: IS-UCB can identify models that generate images with more diversity. Results are averaged over 20 trials.
  • Figure 5: Online FD-based evaluation and selection among three CIFAR10 models, including LOGAN, RESFLOW, and iDDPM-DDiM (Figure \ref{['fig1']}). The image data embeddings are extracted by InceptionV3.Net. Results are averaged over 20 trials.
  • ...and 10 more figures

Theorems & Definitions (38)

  • Theorem 1: Optimistic FD score
  • Remark 1: Model-dependent parameters
  • Theorem 2: Optimistic IS
  • proof
  • Theorem 3: Regret of FD-UCB
  • proof
  • Theorem 4: Concentration of empirical FD (\ref{['emp_fid']})
  • proof
  • Lemma 1: $L$2-norm error for mean and covariance matrix
  • proof
  • ...and 28 more