Determined Multi-Label Learning via Similarity-Based Prompt
Meng Wei, Zhongnian Li, Peng Ying, Yong Zhou, Xinzheng Xu
TL;DR
This work addresses the high cost of annotating full multi-label annotations by introducing Determined Multi-Label Learning (DMLL), where each training example is paired with a binary determined label for a randomly chosen candidate class. It develops a risk-consistent estimator that reweights the loss using $p(y^{\gamma}=1|x)$ and $p(y^{\gamma}=0|x)$, enabling learning from determined-labeled data with a binary cross-entropy loss framework; when these probabilities are unknown, they are estimated via the RAM-based image model and sigmoid outputs. To further improve performance, the paper introduces a similarity-based prompt (SBP) mechanism that augments each target label with semantically similar labels drawn from a large RAM label space, using the CLIP text encoder to compute similarities and optimize an optimal prompt $P^*$. Empirical results on VOC, COCO, NUS, and CUB show that DMLL consistently outperforms state-of-the-art weakly supervised approaches across key metrics (MAP, ranking loss, one error, and coverage), validating both the theoretical guarantees and the practical value of SBP in large-scale vision-language settings.
Abstract
In multi-label classification, each training instance is associated with multiple class labels simultaneously. Unfortunately, collecting the fully precise class labels for each training instance is time- and labor-consuming for real-world applications. To alleviate this problem, a novel labeling setting termed \textit{Determined Multi-Label Learning} (DMLL) is proposed, aiming to effectively alleviate the labeling cost inherent in multi-label tasks. In this novel labeling setting, each training instance is associated with a \textit{determined label} (either "Yes" or "No"), which indicates whether the training instance contains the provided class label. The provided class label is randomly and uniformly selected from the whole candidate labels set. Besides, each training instance only need to be determined once, which significantly reduce the annotation cost of the labeling task for multi-label datasets. In this paper, we theoretically derive an risk-consistent estimator to learn a multi-label classifier from these determined-labeled training data. Additionally, we introduce a similarity-based prompt learning method for the first time, which minimizes the risk-consistent loss of large-scale pre-trained models to learn a supplemental prompt with richer semantic information. Extensive experimental validation underscores the efficacy of our approach, demonstrating superior performance compared to existing state-of-the-art methods.
