Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

Tianshui Chen; Tao Pu; Lingbo Liu; Yukai Shi; Zhijing Yang; Liang Lin

Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

Tianshui Chen, Tao Pu, Lingbo Liu, Yukai Shi, Zhijing Yang, Liang Lin

TL;DR

This work tackles multi-label recognition with partial labels (MLR-PL) by introducing Heterogeneous Semantic Transfer (HST), which exploits both intra-image label co-occurrence and cross-image category prototypes to generate high-quality pseudo labels for unknown labels. The framework combines a semantic-aware representation learning backbone (SARL) with two transfer modules: IST, which learns image-specific co-occurrences, and CST, which learns category-specific prototypes and cross-image similarities. A differential threshold learning (DTL) mechanism enables automatic threshold optimization for pseudo-label generation, and the overall objective combines supervised partial BCE with IST/CST and DTL losses in an end-to-end manner. Extensive experiments on MS-COCO, VG-200, and VOC 2007 show that HST consistently outperforms state-of-the-art methods across varying known-label proportions, with ablations confirming the critical contributions of IST, CST, SARL, and adaptive thresholding. The approach advances practical MLR-PL by reducing annotation costs while delivering strong, scalable performance, and the authors provide open-source code for reproducibility.

Abstract

Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each image, may greatly reduce the cost of annotation and thus facilitate large-scale MLR. We find that strong semantic correlations exist within each image and across different images, and these correlations can help transfer the knowledge possessed by the known labels to retrieve the unknown labels and thus improve the performance of the MLR-PL task (see Figure 1). In this work, we propose a novel heterogeneous semantic transfer (HST) framework that consists of two complementary transfer modules that explore both within-image and cross-image semantic correlations to transfer the knowledge possessed by known labels to generate pseudo labels for the unknown labels. Specifically, an intra-image semantic transfer (IST) module learns an image-specific label co-occurrence matrix for each image and maps the known labels to complement the unknown labels based on these matrices. Additionally, a cross-image transfer (CST) module learns category-specific feature-prototype similarities and then helps complement the unknown labels that have high degrees of similarity with the corresponding prototypes. Finally, both the known and generated pseudo labels are used to train MLR models. Extensive experiments conducted on the Microsoft COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed HST framework achieves superior performance to that of current state-of-the-art algorithms. Specifically, it obtains mean average precision (mAP) improvements of 1.4%, 3.3%, and 0.4% on the three datasets over the results of the best-performing previously developed algorithm.

Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

TL;DR

Abstract

Paper Structure (22 sections, 16 equations, 6 figures, 8 tables)

This paper contains 22 sections, 16 equations, 6 figures, 8 tables.

Introduction
Related Works
HST
SARL
IST
CST
DTL
Optimization
Experiments
Experimental Settings
Comparison with State-of-the-Art Methods
Performance on MS-COCO
Performance on VG-200
Performance on Pascal VOC 2007
Ablation Studies
...and 7 more sections

Figures (6)

Figure 1: Illustration of MLR with complete and partial label settings. For the complete label setting, $\checkmark$ denotes that the corresponding category exists, while $\times$ denotes that it does not exist. For the partial label setting, $\textbf{?}$ denotes that the original label is missing and we do not know whether it exists or does not exist. Both the $\checkmark$ and $\times$ labels are regarded as known labels, while the $\textbf{?}$ labels are regarded as unknown labels.
Figure 2: Two examples of images with partial labels (the unknown labels are highlighted in red). Strong semantic correlations exist with each image and across different images, and we can mine these correlations to help complement the unknown labels.
Figure 3: An overall illustration of the proposed HST framework. The upper part is the overall pipeline that consists of the IST and CST modules for generating pseudo labels, which are then fed to the supervised training process of the MLR model. The lower part contains the detailed implementations of the IST and CST modules. The IST module first predicts label co-occurrence matrices and then maps the known labels to complement the unknown labels. The CST module first learns category-specific feature and prototype similarities across different images and then maps them to generate pseudo labels.
Figure 4: Several examples of images with partial labels (unknown labels are highlighted in red) and their corresponding co-occurrence matrices: (a) the statistical co-occurrence matrix, (b) the image-specific co-occurrence matrix generated by IST without $\mathcal{L}_{ist}$, (c) the image-specific co-occurrence matrix generated by IST.
Figure 5: The performance of the HST framework with only the CST module under different numbers of prototypes. We present the average mAP over all known label proportion settings and the mAPs achieved under 20%, 50%, 80% known label settings on the MS-COCO dataset.
...and 1 more figures

Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

TL;DR

Abstract

Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (6)