Table of Contents
Fetching ...

Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

Xin Wu, Fei Teng, Yue Feng, Kaibo Shi, Zhuosheng Lin, Ji Zhang, James Wang

TL;DR

This paper introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment, and develops a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments.

Abstract

Partial multi-label learning aims to extract knowledge from incompletely annotated data, which includes known correct labels, known incorrect labels, and unknown labels. The core challenge lies in accurately identifying the ambiguous relationships between labels and instances. In this paper, we emphasize that matching co-occurrence patterns between labels and instances is key to addressing this challenge. To this end, we propose Semantic Co-occurrence Insight Network (SCINet), a novel and effective framework for partial multi-label learning. Specifically, SCINet introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment. To reinforce instance-label interdependencies, we develop a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments. Moreover, we propose an intrinsic semantic augmentation strategy that enhances the model's understanding of intrinsic data semantics by applying diverse image transformations, thereby fostering a synergistic relationship between label confidence and sample difficulty. Extensive experiments on four widely-used benchmark datasets demonstrate that SCINet surpasses state-of-the-art methods.

Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

TL;DR

This paper introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment, and develops a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments.

Abstract

Partial multi-label learning aims to extract knowledge from incompletely annotated data, which includes known correct labels, known incorrect labels, and unknown labels. The core challenge lies in accurately identifying the ambiguous relationships between labels and instances. In this paper, we emphasize that matching co-occurrence patterns between labels and instances is key to addressing this challenge. To this end, we propose Semantic Co-occurrence Insight Network (SCINet), a novel and effective framework for partial multi-label learning. Specifically, SCINet introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment. To reinforce instance-label interdependencies, we develop a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments. Moreover, we propose an intrinsic semantic augmentation strategy that enhances the model's understanding of intrinsic data semantics by applying diverse image transformations, thereby fostering a synergistic relationship between label confidence and sample difficulty. Extensive experiments on four widely-used benchmark datasets demonstrate that SCINet surpasses state-of-the-art methods.

Paper Structure

This paper contains 25 sections, 13 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Comparison of the matching patterns between instance and label space. "(a)" shows single instance corresponding to one label, while "(b)" depicts multi-instance corresponding to one label. Both " ✓" and "×" labels are considered known labels, whereas the "?" label is regarded as an unknown label. CL denotes complete label, while PL signifies partial label.
  • Figure 2: Overview of the proposed SCINet method. $X$ consists of instances $s_{i}$, each with known labels $c_{i}$ and unknown labels $u_{i}$. $T^{*}$ denotes the label confidence matrix.
  • Figure 3: Our SCINet model and the current advanced methods are compared in terms of mAP on the VOC2007 and COCO2014 datasets, with known label proportions set to increase arithmetically by 20%, ranging from 10% to 50%.
  • Figure 4: Performance comparison of GCN layer depth and single label settings on the COCO2014 dataset.
  • Figure 5: The t-SNE visualization of specific categories, including "person," "chair," and "motorcycle," from the VOC2007 dataset for both the Baseline and SCINet models.
  • ...and 5 more figures