Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng, Baoyu Jing, Zihao Li, Hanghang Tong, Jingrui He
TL;DR
The work surveys heterogeneous contrastive learning for foundation models, addressing both view heterogeneity (across vision, language, and multimodal data) and task heterogeneity (pretraining and downstream tasks). It categorizes methods by data views and by pretraining/downstream objectives, detailing classic and modern techniques (e.g., CLIP-style cross-modal CL, dropout-based LM CL, and graph/time-series CL) and their variants, including AutoML, prompt learning, and task reformulation strategies. Key contributions include a structured taxonomy of techniques, critical comparisons, and a forward-looking discussion of challenges such as efficiency, benchmarks, trustworthiness, and mechanisms linking CL strategies to downstream performance. The survey provides a roadmap for designing scalable, robust, and versatile heterogeneous CL pipelines applicable to diverse data modalities and application domains.
Abstract
In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive self-supervised learning to model large-scale heterogeneous data. Many existing foundation models benefit from the generalization capability of contrastive self-supervised learning by learning compact and high-quality representations without relying on any label information. Amidst the explosive advancements in foundation models across multiple domains, including natural language processing and computer vision, a thorough survey on heterogeneous contrastive learning for the foundation model is urgently needed. In response, this survey critically evaluates the current landscape of heterogeneous contrastive learning for foundation models, highlighting the open challenges and future trends of contrastive learning. In particular, we first present how the recent advanced contrastive learning-based methods deal with view heterogeneity and how contrastive learning is applied to train and fine-tune the multi-view foundation models. Then, we move to contrastive learning methods for task heterogeneity, including pretraining tasks and downstream tasks, and show how different tasks are combined with contrastive learning loss for different purposes. Finally, we conclude this survey by discussing the open challenges and shedding light on the future directions of contrastive learning.
