A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends
Xiang Li, Jiexi Liu, Xinrui Wang, Songcan Chen
TL;DR
This survey defines incomplete multi-label learning (InMLL) as learning from training data where only a subset of true labels is observed, and one aims to predict all relevant labels for new instances. It provides formal definitions, a twofold taxonomy (data-oriented and algorithm-oriented), and a synthesis of core challenges such as label correlation inconsistency, imbalanced labeling, and noisy labels, along with practical issues in model assumptions and selection cost. The paper also discusses single positive multi-label learning (SPMLL) as a specialized setting, surveys application domains (CV, NLP, data mining, and medicine), and outlines future directions, including non-random missingness, weakly semi-supervised settings, open-set InMLL, multimodal and robust learning approaches. Together, these contributions offer a structured roadmap for researchers and practitioners to design cost-efficient, scalable InMLL methods and apply them to real-world tasks.
Abstract
In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomplete multi-label learning (InMLL) has emerged, aiming to learn from incomplete labeled data. To date, enormous InMLL works have been proposed to narrow the performance gap with complete MLL, whereas a systematic review for InMLL is still absent. In this paper, we not only attempt to fill the lacuna but also strive to pave the way for innovative research. Specifically, we retrospect the origin of InMLL, analyze the challenges of InMLL, and make a taxonomy of InMLL from the data-oriented and algorithm-oriented perspectives, respectively. Besides, we also present real applications of InMLL in various domains. More importantly, we highlight several potential future trends, including four open problems that are more in line with practice and three under-explored/unexplored techniques in addressing the challenges of InMLL, which may shed new light on developing novel research directions in the field of InMLL.
