NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval
Zengrong Lin, Zheng Wang, Tianwen Qian, Pan Mu, Sixian Chan, Cong Bai
TL;DR
Hubness remains a challenge in cross-modal retrieval, biasing nearest-neighbor relations even for strong alignment models. NeighborRetr mitigates hubness during training by estimating sample centrality, weighting hub learning, balancing neighborhood relations, and enforcing uniform retrieval, integrated within a two-level visual-text learning pipeline. It introduces three losses—centrality weighting, neighbor adjusting, and uniform regularization—along with a KL-term for stability, achieving state-of-the-art results on four text-video and three text-image benchmarks and demonstrating robust cross-domain generalization. The work provides empirical evidence that training-time hubness mitigation improves both accuracy and fairness in cross-modal retrieval and releases code for reproducibility.
Abstract
Cross-modal retrieval aims to bridge the semantic gap between different modalities, such as visual and textual data, enabling accurate retrieval across them. Despite significant advancements with models like CLIP that align cross-modal representations, a persistent challenge remains: the hubness problem, where a small subset of samples (hubs) dominate as nearest neighbors, leading to biased representations and degraded retrieval accuracy. Existing methods often mitigate hubness through post-hoc normalization techniques, relying on prior data distributions that may not be practical in real-world scenarios. In this paper, we directly mitigate hubness during training and introduce NeighborRetr, a novel method that effectively balances the learning of hubs and adaptively adjusts the relations of various kinds of neighbors. Our approach not only mitigates the hubness problem but also enhances retrieval performance, achieving state-of-the-art results on multiple cross-modal retrieval benchmarks. Furthermore, NeighborRetr demonstrates robust generalization to new domains with substantial distribution shifts, highlighting its effectiveness in real-world applications. We make our code publicly available at: https://github.com/zzezze/NeighborRetr .
