GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation
Nicolas Salvy, Hugues Talbot, Bertrand Thirion
TL;DR
This work tackles the instability of distance-based evaluation in high-dimensional embedding spaces caused by hubness. It introduces GICDM, a hubness-aware extension of ICDM that first uniformizes the real data density via ICDM, then computes per-generated-point scaling factors using only real data, and finally applies multi-scale filtering to guard against overcorrection. The approach yields more reliable fidelity and coverage metrics and better aligns with human judgments across synthetic and real benchmarks. By mitigating hubness while preserving the relative positioning of generated samples, GICDM enables more trustworthy evaluation of high-dimensional generative models.
Abstract
Generative model evaluation commonly relies on high-dimensional embedding spaces to compute distances between samples. We show that dataset representations in these spaces are affected by the hubness phenomenon, which distorts nearest neighbor relationships and biases distance-based metrics. Building on the classical Iterative Contextual Dissimilarity Measure (ICDM), we introduce Generative ICDM (GICDM), a method to correct neighborhood estimation for both real and generated data. We introduce a multi-scale extension to improve empirical behavior. Extensive experiments on synthetic and real benchmarks demonstrate that GICDM resolves hubness-induced failures, restores reliable metric behavior, and improves alignment with human judgment.
