Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao, Mengxi Chen, Tianjie Dai, Jiangchao Yao, Bo han, Ya Zhang, Yanfeng Wang
TL;DR
This work tackles noisy cross-modal correspondence that distorts both cross-modal and intra-modal geometrical structures. It introduces Geometrical Structure Consistency (GSC), a method that simultaneously preserves cross-modal similarities and intra-modal structures, using two noise-robust indicators $y_{\text{CM}}$ and $y_{\text{IM}}$ to infer true correspondences and purify the training losses. Through a purified contrastive cross-modal loss and a purified intra-modal loss, GSC leverages early memorization to establish stable geometry and then refines representations with a temporal ensembling scheme. Empirical results on four benchmarks, including CC152K, show that GSC consistently outperforms state-of-the-art noisy-correspondence methods and remains robust across varying noise levels and real-world data, with practical impact for multimodal retrieval systems.
Abstract
Noisy correspondence that refers to mismatches in cross-modal data pairs, is prevalent on human-annotated or web-crawled datasets. Prior approaches to leverage such data mainly consider the application of uni-modal noisy label learning without amending the impact on both cross-modal and intra-modal geometrical structures in multimodal learning. Actually, we find that both structures are effective to discriminate noisy correspondence through structural differences when being well-established. Inspired by this observation, we introduce a Geometrical Structure Consistency (GSC) method to infer the true correspondence. Specifically, GSC ensures the preservation of geometrical structures within and between modalities, allowing for the accurate discrimination of noisy samples based on structural differences. Utilizing these inferred true correspondence labels, GSC refines the learning of geometrical structures by filtering out the noisy samples. Experiments across four cross-modal datasets confirm that GSC effectively identifies noisy samples and significantly outperforms the current leading methods.
