Disentangling Mean Embeddings for Better Diagnostics of Image Generators
Sebastian G. Gruber, Pascal Tobias Ziegler, Florian Buettner
TL;DR
The paper tackles the difficulty of evaluating image generators with region-specific diagnostics by introducing a disentanglement of mean embeddings into cluster-wise components using central kernel alignment. It proves that, under a partition with vanishing cross-cluster CKAs, the image-wide cosine mean similarity (CMS) decomposes into a product of cluster-wise CMS terms, enabling localized performance assessment. Practically, the authors identify pixel clusters via pairwise CKA and hierarchical clustering, then monitor cluster-wise CMS during training to pinpoint regions where generators struggle. Experiments on CelebA and ChestMNIST with DCGAN and DDPM architectures illustrate that cluster-level analysis can reveal misbehavior in specific image regions, offering more actionable diagnostics than standard metrics like MMD or FID.
Abstract
The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for individual pixel clusters via central kernel alignment. Consequently, we can quantify the contribution of the cluster-wise performance to the overall image generation performance. We demonstrate how this enhances the explainability and the likelihood of identifying pixel regions of model misbehavior across various real-world use cases.
