Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels
Tianshui Chen, Tao Pu, Lingbo Liu, Yukai Shi, Zhijing Yang, Liang Lin
TL;DR
This work tackles multi-label recognition with partial labels (MLR-PL) by introducing Heterogeneous Semantic Transfer (HST), which exploits both intra-image label co-occurrence and cross-image category prototypes to generate high-quality pseudo labels for unknown labels. The framework combines a semantic-aware representation learning backbone (SARL) with two transfer modules: IST, which learns image-specific co-occurrences, and CST, which learns category-specific prototypes and cross-image similarities. A differential threshold learning (DTL) mechanism enables automatic threshold optimization for pseudo-label generation, and the overall objective combines supervised partial BCE with IST/CST and DTL losses in an end-to-end manner. Extensive experiments on MS-COCO, VG-200, and VOC 2007 show that HST consistently outperforms state-of-the-art methods across varying known-label proportions, with ablations confirming the critical contributions of IST, CST, SARL, and adaptive thresholding. The approach advances practical MLR-PL by reducing annotation costs while delivering strong, scalable performance, and the authors provide open-source code for reproducibility.
Abstract
Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each image, may greatly reduce the cost of annotation and thus facilitate large-scale MLR. We find that strong semantic correlations exist within each image and across different images, and these correlations can help transfer the knowledge possessed by the known labels to retrieve the unknown labels and thus improve the performance of the MLR-PL task (see Figure 1). In this work, we propose a novel heterogeneous semantic transfer (HST) framework that consists of two complementary transfer modules that explore both within-image and cross-image semantic correlations to transfer the knowledge possessed by known labels to generate pseudo labels for the unknown labels. Specifically, an intra-image semantic transfer (IST) module learns an image-specific label co-occurrence matrix for each image and maps the known labels to complement the unknown labels based on these matrices. Additionally, a cross-image transfer (CST) module learns category-specific feature-prototype similarities and then helps complement the unknown labels that have high degrees of similarity with the corresponding prototypes. Finally, both the known and generated pseudo labels are used to train MLR models. Extensive experiments conducted on the Microsoft COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed HST framework achieves superior performance to that of current state-of-the-art algorithms. Specifically, it obtains mean average precision (mAP) improvements of 1.4%, 3.3%, and 0.4% on the three datasets over the results of the best-performing previously developed algorithm.
