Collaborative Learning of Semantic-Aware Feature Learning and Label Recovery for Multi-Label Image Recognition with Incomplete Labels
Zhi-Fen He, Ren-Dong Xie, Bo Li, Bin Liu, Jin-Yan Hu
TL;DR
Addresses multi-label image recognition with incomplete labels by proposing CLSL, a framework that jointly learns semantic-aware features and recovers missing labels in a collaborative loop. It combines Semantic-Related Feature Learning (SRFL) and Semantic-Guided Feature Enhancement (SGFE) to align visual and semantic spaces, and uses a Label Recovery module to generate pseudo-labels guiding training via an ASL-based loss on refined and global predictions. The joint optimization enables mutual reinforcement between feature discriminability and label completeness, improving performance under sparse supervision. Experiments on MS-COCO, VOC2007, and NUS-WIDE demonstrate state-of-the-art results and robustness to label sparsity, highlighting practical impact for weakly supervised multi-label recognition.
Abstract
Multi-label image recognition with incomplete labels is a critical learning task and has emerged as a focal topic in computer vision. However, this task is confronted with two core challenges: semantic-aware feature learning and missing label recovery. In this paper, we propose a novel Collaborative Learning of Semantic-aware feature learning and Label recovery (CLSL) method for multi-label image recognition with incomplete labels, which unifies the two aforementioned challenges into a unified learning framework. More specifically, we design a semantic-related feature learning module to learn robust semantic-related features by discovering semantic information and label correlations. Then, a semantic-guided feature enhancement module is proposed to generate high-quality discriminative semantic-aware features by effectively aligning visual and semantic feature spaces. Finally, we introduce a collaborative learning framework that integrates semantic-aware feature learning and label recovery, which can not only dynamically enhance the discriminability of semantic-aware features but also adaptively infer and recover missing labels, forming a mutually reinforced loop between the two processes. Extensive experiments on three widely used public datasets (MS-COCO, VOC2007, and NUS-WIDE) demonstrate that CLSL outperforms the state-of-the-art multi-label image recognition methods with incomplete labels.
