Table of Contents
Fetching ...

ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification

Ahmad Sajedi, Samir Khaki, Yuri A. Lawryshyn, Konstantinos N. Plataniotis

TL;DR

This work tackles multi-label image classification by modeling label dependencies and encoder uncertainty without heavy correlation modules. It introduces ProbMCL, which processes two augmented views through an encoder and uses a Mixture Density Network to model a Gaussian mixture over features: $p(z|x) = \sum_{k=1}^{C} \pi_k(x) N(z; \mu_k(x), \Sigma_k(x))$, with $C$ equal to the number of classes. Training optimizes a negative log-likelihood for the Gaussian mixture and a probabilistic contrastive loss where positives are selected with an overlap threshold via $A(i) = { j in I without i : D(y_i,y_j) \ge \alpha }$ and similarity between Gaussian mixtures $Sim(p_i,p_j)$; the overall objective is $\mathcal{L} = \mathcal{L}_{NLL} + \lambda \mathcal{L}_{PCL}$. At inference, the MDN is discarded and a frozen encoder is augmented with a simple asymmetric loss to perform multi-label classification, achieving state-of-the-art results on MS-COCO and ADP with reduced computational cost. Overall, ProbMCL provides a scalable, interpretable approach that captures label co-dependencies and model uncertainty, with ablations confirming the importance of the overlap metric and hyperparameters.

Abstract

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks. Our simple yet effective approach employs supervised contrastive learning, in which samples that share enough labels with an anchor image based on a decision threshold are introduced as a positive set. This structure captures label dependencies by pulling positive pair embeddings together and pushing away negative samples that fall below the threshold. We enhance representation learning by incorporating a mixture density network into contrastive learning and generating Gaussian mixture distributions to explore the epistemic uncertainty of the feature encoder. We validate the effectiveness of our framework through experimentation with datasets from the computer vision and medical imaging domains. Our method outperforms the existing state-of-the-art methods while achieving a low computational footprint on both datasets. Visualization analyses also demonstrate that ProbMCL-learned classifiers maintain a meaningful semantic topology.

ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification

TL;DR

This work tackles multi-label image classification by modeling label dependencies and encoder uncertainty without heavy correlation modules. It introduces ProbMCL, which processes two augmented views through an encoder and uses a Mixture Density Network to model a Gaussian mixture over features: , with equal to the number of classes. Training optimizes a negative log-likelihood for the Gaussian mixture and a probabilistic contrastive loss where positives are selected with an overlap threshold via and similarity between Gaussian mixtures ; the overall objective is . At inference, the MDN is discarded and a frozen encoder is augmented with a simple asymmetric loss to perform multi-label classification, achieving state-of-the-art results on MS-COCO and ADP with reduced computational cost. Overall, ProbMCL provides a scalable, interpretable approach that captures label co-dependencies and model uncertainty, with ablations confirming the importance of the overlap metric and hyperparameters.

Abstract

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks. Our simple yet effective approach employs supervised contrastive learning, in which samples that share enough labels with an anchor image based on a decision threshold are introduced as a positive set. This structure captures label dependencies by pulling positive pair embeddings together and pushing away negative samples that fall below the threshold. We enhance representation learning by incorporating a mixture density network into contrastive learning and generating Gaussian mixture distributions to explore the epistemic uncertainty of the feature encoder. We validate the effectiveness of our framework through experimentation with datasets from the computer vision and medical imaging domains. Our method outperforms the existing state-of-the-art methods while achieving a low computational footprint on both datasets. Visualization analyses also demonstrate that ProbMCL-learned classifiers maintain a meaningful semantic topology.
Paper Structure (8 sections, 4 equations, 3 figures, 3 tables)

This paper contains 8 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of the ProbMCL framework in (a) the contrastive stage and (b) the classification stage. In the classification stage, the MDN is discarded and the trained encoder is retained. (c) The internal architecture of the Mixture Density Network (MDN).
  • Figure 2: The effect of loss hyperparameters on the mAP score (%) for the MS-COCO dataset lin2014microsoft.
  • Figure 3: Visualization analyses of baseline (ASL) and the proposed method across MS-COCO (top) and ADP (bottom) datasets.