Quantifying the uncertainty of model-based synthetic image quality metrics
Ciaran Bench, Spencer A. Thomas
TL;DR
The paper addresses the reliability of model-based image quality metrics that rely on domain-specific feature embeddings. It proposes a Monte Carlo dropout–based convolutional autoencoder to model epistemic uncertainty in latent embeddings, enabling a distribution over FAED values and introducing two uncertainty measures: $pVar$ and $\sigma_{FAED}$. Empirical results show that these uncertainty metrics correlate with the degree to which inputs are out-of-distribution relative to the training data, supporting their use as heuristics for trustworthiness of $FAED$. This framework offers a practical approach to gauge the reliability of domain-specific image quality assessments and can inform the selection or adaptation of feature extractors for different domains.
Abstract
The quality of synthetically generated images (e.g. those produced by diffusion models) are often evaluated using information about image contents encoded by pretrained auxiliary models. For example, the Fréchet Inception Distance (FID) uses embeddings from an InceptionV3 model pretrained to classify ImageNet. The effectiveness of this feature embedding model has considerable impact on the trustworthiness of the calculated metric (affecting its suitability in several domains, including medical imaging). Here, uncertainty quantification (UQ) is used to provide a heuristic measure of the trustworthiness of the feature embedding model and an FID-like metric called the Fréchet Autoencoder Distance (FAED). We apply Monte Carlo dropout to a feature embedding model (convolutional autoencoder) to model the uncertainty in its embeddings. The distribution of embeddings for each input are then used to compute a distribution of FAED values. We express uncertainty as the predictive variance of the embeddings as well as the standard deviation of the computed FAED values. We find that their magnitude correlates with the extent to which the inputs are out-of-distribution to the model's training data, providing some validation of its ability to assess the trustworthiness of the FAED.
