A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

Xiaoyan Hu; Ho-fung Leung; Farzan Farnia

A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

Xiaoyan Hu, Ho-fung Leung, Farzan Farnia

TL;DR

The paper addresses the high cost of offline evaluation for deep generative models by formulating online model selection as a multi-armed bandit problem. It develops two algorithms, FD-UCB and IS-UCB, that provide data-dependent, optimistic bounds for Fréchet Distance and Inception Score, respectively, and proves sub-linear regret guarantees. Empirical results across CIFAR-10, ImageNet, FFHQ, and AFHQ demonstrate that online, sample-efficient evaluation can accurately identify top-performing models while reducing query costs. The work enables practical, cost-effective ranking of generative models and offers a foundation for extending to other modalities and contextual settings.

Abstract

Existing frameworks for evaluating and comparing generative models consider an offline setting, where the evaluator has access to large batches of data produced by the models. However, in practical scenarios, the goal is often to identify and select the best model using the fewest possible generated samples to minimize the costs of querying data from the sub-optimal models. In this work, we propose an online evaluation and selection framework to find the generative model that maximizes a standard assessment score among a group of available models. We view the task as a multi-armed bandit (MAB) and propose upper confidence bound (UCB) bandit algorithms to identify the model producing data with the best evaluation score that quantifies the quality and diversity of generated data. Specifically, we develop the MAB-based selection of generative models considering the Fréchet Distance (FD) and Inception Score (IS) metrics, resulting in the FD-UCB and IS-UCB algorithms. We prove regret bounds for these algorithms and present numerical results on standard image datasets. Our empirical results suggest the efficacy of MAB approaches for the sample-efficient evaluation and selection of deep generative models. The project code is available at https://github.com/yannxiaoyanhu/dgm-online-eval.

A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

TL;DR

Abstract

Paper Structure (29 sections, 18 theorems, 85 equations, 15 figures, 3 tables, 2 algorithms)

This paper contains 29 sections, 18 theorems, 85 equations, 15 figures, 3 tables, 2 algorithms.

INTRODUCTION
RELATED WORK
PRELIMINARIES
Inception Score
Fréchet Distance
ONLINE EVALUATION OF GENERATIVE MODELS
FRÉCHET DISTANCE-BASED ONLINE EVALUATION AND SELECTION
INCEPTION SCORE-BASED ONLINE EVALUATION AND SELECTION
EXPERIMENTAL RESULTS
Results of Online FD-based Evaluation and Selection
Results of Online IS-Based Evaluation and Selection
CONCLUSION
PROOFS IN SECTION \ref{['SEC:5']}: FD-BASED EVALUATION
Proof of Theorem \ref{['thm_ofid']}: Optimistic FD Score
Regret of FD-UCB
...and 14 more sections

Key Result

Theorem 1

Assume for any generator $g$, the (random) embedding $f(X_g) \sim {\mathcal{N}}(\mu_g,\Sigma_g)$ follows a multivariate Gaussian, and the covariance matrix $\Sigma_\textup{r}$ of the real data is positive definite. Then, with probability at least $1-\delta$, we have for any $n\ge 4\bm{r}(\Sigma_g)+\log(3/\delta)$, where $\bm{r}(\Sigma_g):=\frac{\textup{Tr}[\Sigma_g]}{\|\Sigma_g\|_2}$ is the effec

Figures (15)

Figure 1: FID-based evaluation and selection among CIFAR10 generative models: The standard offline evaluation requires a large batch of data from every model. In contrast, our proposed online approach leverages the UCB multi-armed bandit strategy to identify the best model with fewer generations from the suboptimal models.
Figure 2: Online FD-based evaluation and selection among standard generative models: The $x$-axis is the number of online steps. At each step, the algorithm samples a batch of five generated images from the chosen model. The image data embeddings are extracted by CLIP cherti2023reproducible. Results are averaged over 20 trials.
Figure 3: Online IS-based evaluation and selection among standard generative models: The $x$-axis is the number of steps. At each step, the algorithm samples a batch of five generated images from the chosen model. Results are averaged over 20 trials.
Figure 4: Online IS-based evaluation and selection among variance-controlled FFHQ models: IS-UCB can identify models that generate images with more diversity. Results are averaged over 20 trials.
Figure 5: Online FD-based evaluation and selection among three CIFAR10 models, including LOGAN, RESFLOW, and iDDPM-DDiM (Figure \ref{['fig1']}). The image data embeddings are extracted by InceptionV3.Net. Results are averaged over 20 trials.
...and 10 more figures

Theorems & Definitions (38)

Theorem 1: Optimistic FD score
Remark 1: Model-dependent parameters
Theorem 2: Optimistic IS
proof
Theorem 3: Regret of FD-UCB
proof
Theorem 4: Concentration of empirical FD (\ref{['emp_fid']})
proof
Lemma 1: $L$2-norm error for mean and covariance matrix
proof
...and 28 more

A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

TL;DR

Abstract

A Multi-Armed Bandit Approach to Online Selection and Evaluation of Generative Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (38)