Table of Contents
Fetching ...

Weakly-Supervised Contrastive Learning for Imprecise Class Labels

Zi-Hao Zhou, Jun-Jie Wang, Tong Wei, Min-Ling Zhang

TL;DR

This work tackles the limitation of relying on imprecise class labels in contrastive learning by introducing continuous semantic similarity and a graph-based weakly-supervised framework. It unifies self-supervised and weakly-supervised signals through a perturbation graph $\boldsymbol{A} = \alpha \boldsymbol{A}^{u} + \beta \boldsymbol{A}^{wl}(\boldsymbol{S})$, where $\boldsymbol{S}$ encodes semantic similarity, and derives an end-to-end weakly-supervised spectral contrastive loss $\mathcal{L}_{wsc}$. The theory connects graph connectivity properties (Dirichlet conductance, sparsest partition) to linear probe error bounds, and shows that under mild conditions and sufficient data, the method approaches supervised performance. Empirically, the framework yields robust improvements in noisy-label and partial-label settings across standard benchmarks, including challenging high-noise and high-ambiguity scenarios, demonstrating practical utility for real-world weak supervision. The approach opens doors to broader applications such as bag-level weak supervision and multi-modal matching, with a principled balance between clustering quality and estimation error of semantic similarities.

Abstract

Contrastive learning has achieved remarkable success in learning effective representations, with supervised contrastive learning often outperforming self-supervised approaches. However, in real-world scenarios, data annotations are often ambiguous or inaccurate, meaning that class labels may not reliably indicate whether two examples belong to the same class. This limitation restricts the applicability of supervised contrastive learning. To address this challenge, we introduce the concept of ``continuous semantic similarity'' to define positive and negative pairs. Instead of directly relying on imprecise class labels, we measure the semantic similarity between example pairs, which quantifies how closely they belong to the same category by iteratively refining weak supervisory signals. Based on this concept, we propose a graph-theoretic framework for weakly-supervised contrastive learning, where semantic similarity serves as the graph weights. Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios. We demonstrate its effectiveness through experiments in two common settings, i.e., noisy label and partial label learning, where existing methods can be easily integrated to significantly improve performance. Theoretically, we establish an error bound for our approach, showing that it can approximate supervised contrastive learning under mild conditions. The implementation code is available at https://github.com/Speechless-10308/WSC.

Weakly-Supervised Contrastive Learning for Imprecise Class Labels

TL;DR

This work tackles the limitation of relying on imprecise class labels in contrastive learning by introducing continuous semantic similarity and a graph-based weakly-supervised framework. It unifies self-supervised and weakly-supervised signals through a perturbation graph , where encodes semantic similarity, and derives an end-to-end weakly-supervised spectral contrastive loss . The theory connects graph connectivity properties (Dirichlet conductance, sparsest partition) to linear probe error bounds, and shows that under mild conditions and sufficient data, the method approaches supervised performance. Empirically, the framework yields robust improvements in noisy-label and partial-label settings across standard benchmarks, including challenging high-noise and high-ambiguity scenarios, demonstrating practical utility for real-world weak supervision. The approach opens doors to broader applications such as bag-level weak supervision and multi-modal matching, with a principled balance between clustering quality and estimation error of semantic similarities.

Abstract

Contrastive learning has achieved remarkable success in learning effective representations, with supervised contrastive learning often outperforming self-supervised approaches. However, in real-world scenarios, data annotations are often ambiguous or inaccurate, meaning that class labels may not reliably indicate whether two examples belong to the same class. This limitation restricts the applicability of supervised contrastive learning. To address this challenge, we introduce the concept of ``continuous semantic similarity'' to define positive and negative pairs. Instead of directly relying on imprecise class labels, we measure the semantic similarity between example pairs, which quantifies how closely they belong to the same category by iteratively refining weak supervisory signals. Based on this concept, we propose a graph-theoretic framework for weakly-supervised contrastive learning, where semantic similarity serves as the graph weights. Our framework is highly versatile and can be applied to many weakly-supervised learning scenarios. We demonstrate its effectiveness through experiments in two common settings, i.e., noisy label and partial label learning, where existing methods can be easily integrated to significantly improve performance. Theoretically, we establish an error bound for our approach, showing that it can approximate supervised contrastive learning under mild conditions. The implementation code is available at https://github.com/Speechless-10308/WSC.

Paper Structure

This paper contains 30 sections, 22 theorems, 122 equations, 2 figures, 11 tables, 2 algorithms.

Key Result

Proposition 2.1

For any $\boldsymbol{S}:\mathcal{X} \to \mathbb{R}^{c \times v}$ that satisfies the condition: $\mathbb{P}(\boldsymbol{y} \mid x) = \boldsymbol{S} (x)\mathbb{P}(\boldsymbol{q} \mid x)$ holds almost everywhere in $\mathcal{X}$, the following equation holds: where $S\left((\tilde{x}, \tilde{q}), (\tilde{x}^{\prime}, \tilde{q}^{\prime})\right) = \boldsymbol{S}(\tilde{x})_{:, \tilde{q}} ^{T} \boldsym

Figures (2)

  • Figure 1: (a) Self-supervised contrastive learning constructs positive example pairs by using different views of the same image and constructs negative example pairs by using different images. (b) Supervised contrastive learning further regards different images of the same class as additional positive example pairs. (c) Our proposed weakly-supervised contrastive learning abandons the concepts of discrete positive and negative examples.
  • Figure 2: We present t-SNE visualizations of the learned representations on the CIFAR dataset with 90% symmetric noise. Figure \ref{['subfig:wsc-10']} and Figure \ref{['subfig:elr-10']} illustrate the results for WSC and ELR+ on CIFAR-10, while Figure \ref{['subfig:wsc-100']} and Figure \ref{['subfig:elr-100']} display the results for WSC and ELR+ on CIFAR-100 under the same noise condition, where only 20 categories with the highest accuracy for each model are shown.

Theorems & Definitions (46)

  • Proposition 2.1
  • Definition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • proof
  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Theorem 3.4
  • Definition 3.5
  • ...and 36 more