A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters
Shasvat Desai, Debasmita Ghose, Deep Chakraborty
TL;DR
Addressing the data curation bottleneck in visual contrastive learning, this paper provides a comprehensive taxonomy of positive and negative pair creation strategies. It systematically organizes methods into single-instance versus multi-instance positives and into hard, false, and synthetic negatives, with subcategories including embedding-based, synthetic, supervised, attribute-based, and cross-modal approaches. The analysis highlights key trade-offs between diversity and semantic relevance, as well as computational considerations, and discusses open questions for handling emerging modalities. The resulting framework offers practical guidance for designing informative, efficient contrastive representations with better downstream generalization.
Abstract
Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.
