A Unifying Information-theoretic Perspective on Evaluating Generative Models
Alexis Fox, Samarth Swarup, Abhijin Adiga
TL;DR
The paper tackles the challenge of evaluating generative models by proposing a unifying information-theoretic framework for precision and recall-based metrics. It introduces a tri-dimensional metric with $PCE$, $RCE$, and $RE$, grounded in entropy and cross-entropy, capable of assessing fidelity as well as inter- and intra-class diversity and enabling population- and sample-level diagnostics. The authors connect existing precision/recall metrics to a general divergence view, derive empirical definitions via kNN estimators, and demonstrate through experiments that the new components correlate with human judgments and diagnose failure modes (mode invention, dropping, shrinkage) more effectively than one-dimensional metrics like FD. The framework is domain-agnostic and supports diagnostic, modality-agnostic evaluation with practical relevance for model selection and auditing. Code and reproducibility resources are provided to facilitate adoption across data modalities.
Abstract
Considering the difficulty of interpreting generative model output, there is significant current research focused on determining meaningful evaluation metrics. Several recent approaches utilize "precision" and "recall," borrowed from the classification domain, to individually quantify the output fidelity (realism) and output diversity (representation of the real data variation), respectively. With the increase in metric proposals, there is a need for a unifying perspective, allowing for easier comparison and clearer explanation of their benefits and drawbacks. To this end, we unify a class of kth-nearest-neighbors (kNN)-based metrics under an information-theoretic lens using approaches from kNN density estimation. Additionally, we propose a tri-dimensional metric composed of Precision Cross-Entropy (PCE), Recall Cross-Entropy (RCE), and Recall Entropy (RE), which separately measure fidelity and two distinct aspects of diversity, inter- and intra-class. Our domain-agnostic metric, derived from the information-theoretic concepts of entropy and cross-entropy, can be dissected for both sample- and mode-level analysis. Our detailed experimental results demonstrate the sensitivity of our metric components to their respective qualities and reveal undesirable behaviors of other metrics.
