An Interpretable Evaluation of Entropy-based Novelty of Generative Models

Jingwei Zhang; Cheuk Ting Li; Farzan Farnia

An Interpretable Evaluation of Entropy-based Novelty of Generative Models

Jingwei Zhang, Cheuk Ting Li, Farzan Farnia

TL;DR

The paper presents a principled framework for evaluating novelty in generative models relative to a reference distribution, focusing on multi-modal data. It introduces the Kernel-based Entropic Novelty (KEN) score, computed from the spectral properties of the differential kernel covariance $\Lambda_{\mathbf{X}|\eta\mathbf{Y}} = C_{\mathbf{X}} - \eta C_{\mathbf{Y}}$, and provides a kernel-trick-based computation via the $\,K_{\mathbf{X}|\eta\mathbf{Y}}$ matrix. Theoretical results show that, for well-separated mixtures, the top eigenvalues approximate mode frequencies, and the positive eigenvalues of $\Lambda_{\mathbf{X}|\eta\mathbf{Y}}$ quantify novelty where X-expressed modes occur $\eta$-times more often than in Y. Empirically, the method detects novel modes in synthetic Gaussian mixtures and real image datasets, reveals interpretability through mode-level clustering, and offers a new distribution-based benchmark to complement existing quality/diversity metrics. Code is provided to enable reproducibility and deployment in model benchmarking tasks.

Abstract

The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine learning community. In this work, we focus on the novelty assessment for multi-modal distributions and attempt to address the following differential clustering task: Given samples of a generative model $P_\mathcal{G}$ and a reference model $P_\mathrm{ref}$, how can we discover the sample types expressed by $P_\mathcal{G}$ more frequently than in $P_\mathrm{ref}$? We introduce a spectral approach to the differential clustering task and propose the Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of $P_\mathcal{G}$ with respect to $P_\mathrm{ref}$. We analyze the KEN score for mixture distributions with well-separable components and develop a kernel-based method to compute the KEN score from empirical data. We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes and comparing generative models. The paper's code is available at: www.github.com/buyeah1109/KEN

An Interpretable Evaluation of Entropy-based Novelty of Generative Models

TL;DR

, and provides a kernel-trick-based computation via the

matrix. Theoretical results show that, for well-separated mixtures, the top eigenvalues approximate mode frequencies, and the positive eigenvalues of

quantify novelty where X-expressed modes occur

-times more often than in Y. Empirically, the method detects novel modes in synthetic Gaussian mixtures and real image datasets, reveals interpretability through mode-level clustering, and offers a new distribution-based benchmark to complement existing quality/diversity metrics. Code is provided to enable reproducibility and deployment in model benchmarking tasks.

Abstract

and a reference model

, how can we discover the sample types expressed by

more frequently than in

? We introduce a spectral approach to the differential clustering task and propose the Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of

with respect to

. We analyze the KEN score for mixture distributions with well-separable components and develop a kernel-based method to compute the KEN score from empirical data. We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes and comparing generative models. The paper's code is available at: www.github.com/buyeah1109/KEN

Paper Structure (25 sections, 5 theorems, 34 equations, 12 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 5 theorems, 34 equations, 12 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Novelty Evaluation of Generative Models
Kernel Function and Kernel Covariance Matrix
A Spectral Approach to Novelty Evaluation for Mixture Models
Theoretical Analysis of the Proposed Spectral Novelty Evaluation
Computation of the KEN Novelty Score
Numerical Results
Experimental Setup
Numerical Results on Synthetic Gaussian Mixtures
Novelty vs. Diversity Evaluation via KEN and Baseline Metrics
Numerical Results on Real/Generated Image Data
Conclusion
Limitations
...and 10 more sections

Key Result

Proposition 1

Using the above definitions, $\widehat{C}_{\mathbf{X}}$ shares the same eigenvalues with the $n\times n$ normalized kernel matrix $\frac{1}{n}\bigl[k(\mathbf{x}_i,\mathbf{x}_j)\bigr]_{n\times n}$ with every $(i,j)$th entry being $\frac{1}{n}k(\mathbf{x}_i,\mathbf{x}_j)$. Therefore, assuming a normal

Figures (12)

Figure 1: Experimental results on synthetic Gaussian mixture distributions including KEN and R-KEN (Reversed-KEN) scores, and principal eigenvalues of the differential kernel covariance matrix $\Lambda_{\mathbf{X}|\eta \mathbf{Y}}$. Top row: Reference (in blue) and test (in red) samples with $\phi_t,\, \phi_r$ denoting the test and reference modes' frequency. Bottom row: Positive eigenvalues of $\Lambda_{\mathbf{X}|\eta \mathbf{Y}}$.
Figure 2: Top 3 rows: Trends of baseline and KEN scores in evaluating novel yet less-diverse distributions with Inception-V3, DINOv2 and CLIP embeddings. Bottom: ImageNet-1K Samples from reference and test distributions. Reference modes: 5 terrestrial animals. Novel modes: 1-3 aquatic lives. $\alpha$ is the ratio of novel modes in testing distribution. $\alpha=0, 1$ represents pure reference and novel distributions, respectively.
Figure 3: Identified top-3 novel modes between image datasets: (Left-half) AFHQ w.r.t. ImageNet-dogs, (Right-half) AFHQ w.r.t. Wildlife. Inception-V3 embedding is used. Shown samples are the test data with the maximum entry values on the top three principal eigenvectors of the differential kernel matrix $K_{\mathbf{X}\vert \eta \mathbf{Y}}$ defined in \ref{['Eq: differential kernel matrix']}.
Figure 4: Identified top novel modes between FFHQ-trained generative models. Inception-V3 embedding is used.
Figure 5: Quantified KEN score in real and generated distributions. Left: KEN score in ImageNet-1K. Intra-class means similarity in taxonomy (e.g. Dogs with different breeds). Right: KEN score in truncated StyleGAN-XL. $\psi$ is truncation factor. $\psi=1$ reduces to normal StyleGAN-XL. "R-KEN" means switching test and reference distributions. Inception-V3 embedding is used.
...and 7 more figures

Theorems & Definitions (6)

Proposition 1
Definition 1
Theorem 1
Theorem 2
Theorem 3
Theorem 4

An Interpretable Evaluation of Entropy-based Novelty of Generative Models

TL;DR

Abstract

An Interpretable Evaluation of Entropy-based Novelty of Generative Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (6)