Table of Contents
Fetching ...

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

Lokesh Veeramacheneni, Moritz Wolter, Hildegard Kuehne, Juergen Gall

TL;DR

FID-based metrics rely on pretrained backbones and exhibit domain bias when evaluating non-ImageNet datasets. Fréchet Wavelet Distance (FWD) addresses this by projecting images into packet space via a wavelet-packet transform and computing a per-packet Fréchet distance, then averaging across packets with $FWD = \frac{1}{P}\sum_{p=1}^{P} d(\mathcal{N}(\mu_{r_p}, \Sigma_{r_p}), \mathcal{N}(\mu_{g_p}, \Sigma_{g_p}))^2$. The approach is domain-agnostic, interpretable, and computationally efficient, demonstrated across diverse datasets and generators, with strong robustness to corruptions and domain shifts and alignment with human judgments. By providing per-packet insights and avoiding ImageNet-dependent biases, FWD offers a reliable complement to existing metrics for cross-domain image synthesis evaluation.

Abstract

Modern metrics for generative learning like Fréchet Inception Distance (FID) and DINOv2-Fréchet Distance (FD-DINOv2) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fréchet Wavelet Distance (FWD) as a domain-agnostic metric based on the Wavelet Packet Transform ($W_p$). FWD provides a sight across a broad spectrum of frequencies in images with a high resolution, preserving both spatial and textural aspects. Specifically, we use $W_p$ to project generated and real images to the packet coefficient space. We then compute the Fréchet distance with the resultant coefficients to evaluate the quality of a generator. This metric is general-purpose and dataset-domain agnostic, as it does not rely on any pre-trained network, while being more interpretable due to its ability to compute Fréchet distance per packet, enhancing transparency. We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD can generalize and improve robustness to domain shifts and various corruptions compared to other metrics.

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

TL;DR

FID-based metrics rely on pretrained backbones and exhibit domain bias when evaluating non-ImageNet datasets. Fréchet Wavelet Distance (FWD) addresses this by projecting images into packet space via a wavelet-packet transform and computing a per-packet Fréchet distance, then averaging across packets with . The approach is domain-agnostic, interpretable, and computationally efficient, demonstrated across diverse datasets and generators, with strong robustness to corruptions and domain shifts and alignment with human judgments. By providing per-packet insights and avoiding ImageNet-dependent biases, FWD offers a reliable complement to existing metrics for cross-domain image synthesis evaluation.

Abstract

Modern metrics for generative learning like Fréchet Inception Distance (FID) and DINOv2-Fréchet Distance (FD-DINOv2) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fréchet Wavelet Distance (FWD) as a domain-agnostic metric based on the Wavelet Packet Transform (). FWD provides a sight across a broad spectrum of frequencies in images with a high resolution, preserving both spatial and textural aspects. Specifically, we use to project generated and real images to the packet coefficient space. We then compute the Fréchet distance with the resultant coefficients to evaluate the quality of a generator. This metric is general-purpose and dataset-domain agnostic, as it does not rely on any pre-trained network, while being more interpretable due to its ability to compute Fréchet distance per packet, enhancing transparency. We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD can generalize and improve robustness to domain shifts and various corruptions compared to other metrics.
Paper Structure (28 sections, 9 equations, 21 figures, 10 tables)

This paper contains 28 sections, 9 equations, 21 figures, 10 tables.

Figures (21)

  • Figure 1: The first two images depict the same person, while the last image depicts a different person. Intuitively, the first two images are more similar than the other pairs of images. When computing the mean squared error between the images using the penultimate InceptionV3 activations or wavelet packets, we observe that the wavelet packets produce a low distance for the first two images, as expected. Surprisingly, according to InceptionV3, the last two images are similar since both images are classified as 'microphone' whereas the first image as 'groom'. Images from Flickr.
  • Figure 2: Illustration of the wpt. For visualization purposes, we depict a level-3 transform. All later experiments use a level-4 transform. Image from WikiCun.
  • Figure 3: fwd computation flow-chart. wpt denotes the wavelet-packet transform. Not all packet coefficients are shown, dashed lines indicate omissions. We compute individual Fréchet Distances for each packet coefficient and finally average across all the coefficients.
  • Figure 4: Samples from (a) projgan and (b) ddgan on the celebahq dataset. The fid prefers projgan irrespective of visual artefacts and floating heads, whereas our metric (fwd) ranks ddgan higher than projgan.
  • Figure 5: Distribution of ImageNet Top-1 classes, predicted by InceptionV3 for real images and images generated by ddgan and projgan. (a) depicts the distribution for the celebahq dataset and (b) shows the distribution for agriculture. Although irrelevant for visual quality, the class distribution of projgan aligns more closely with the real distribution than ddgan for both the datasets, contributing to lower fid for projgan.
  • ...and 16 more figures