Table of Contents
Fetching ...

A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends

Xiang Li, Jiexi Liu, Xinrui Wang, Songcan Chen

TL;DR

This survey defines incomplete multi-label learning (InMLL) as learning from training data where only a subset of true labels is observed, and one aims to predict all relevant labels for new instances. It provides formal definitions, a twofold taxonomy (data-oriented and algorithm-oriented), and a synthesis of core challenges such as label correlation inconsistency, imbalanced labeling, and noisy labels, along with practical issues in model assumptions and selection cost. The paper also discusses single positive multi-label learning (SPMLL) as a specialized setting, surveys application domains (CV, NLP, data mining, and medicine), and outlines future directions, including non-random missingness, weakly semi-supervised settings, open-set InMLL, multimodal and robust learning approaches. Together, these contributions offer a structured roadmap for researchers and practitioners to design cost-efficient, scalable InMLL methods and apply them to real-world tasks.

Abstract

In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomplete multi-label learning (InMLL) has emerged, aiming to learn from incomplete labeled data. To date, enormous InMLL works have been proposed to narrow the performance gap with complete MLL, whereas a systematic review for InMLL is still absent. In this paper, we not only attempt to fill the lacuna but also strive to pave the way for innovative research. Specifically, we retrospect the origin of InMLL, analyze the challenges of InMLL, and make a taxonomy of InMLL from the data-oriented and algorithm-oriented perspectives, respectively. Besides, we also present real applications of InMLL in various domains. More importantly, we highlight several potential future trends, including four open problems that are more in line with practice and three under-explored/unexplored techniques in addressing the challenges of InMLL, which may shed new light on developing novel research directions in the field of InMLL.

A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends

TL;DR

This survey defines incomplete multi-label learning (InMLL) as learning from training data where only a subset of true labels is observed, and one aims to predict all relevant labels for new instances. It provides formal definitions, a twofold taxonomy (data-oriented and algorithm-oriented), and a synthesis of core challenges such as label correlation inconsistency, imbalanced labeling, and noisy labels, along with practical issues in model assumptions and selection cost. The paper also discusses single positive multi-label learning (SPMLL) as a specialized setting, surveys application domains (CV, NLP, data mining, and medicine), and outlines future directions, including non-random missingness, weakly semi-supervised settings, open-set InMLL, multimodal and robust learning approaches. Together, these contributions offer a structured roadmap for researchers and practitioners to design cost-efficient, scalable InMLL methods and apply them to real-world tasks.

Abstract

In reality, data often exhibit associations with multiple labels, making multi-label learning (MLL) become a prominent research topic. The last two decades have witnessed the success of MLL, which is indispensable from complete and accurate supervised information. However, obtaining such information in practice is always laborious and sometimes even impossible. To circumvent this dilemma, incomplete multi-label learning (InMLL) has emerged, aiming to learn from incomplete labeled data. To date, enormous InMLL works have been proposed to narrow the performance gap with complete MLL, whereas a systematic review for InMLL is still absent. In this paper, we not only attempt to fill the lacuna but also strive to pave the way for innovative research. Specifically, we retrospect the origin of InMLL, analyze the challenges of InMLL, and make a taxonomy of InMLL from the data-oriented and algorithm-oriented perspectives, respectively. Besides, we also present real applications of InMLL in various domains. More importantly, we highlight several potential future trends, including four open problems that are more in line with practice and three under-explored/unexplored techniques in addressing the challenges of InMLL, which may shed new light on developing novel research directions in the field of InMLL.
Paper Structure (13 sections, 5 equations, 3 figures)

This paper contains 13 sections, 5 equations, 3 figures.

Figures (3)

  • Figure 1: An illustration of incomplete multi-label data. The left part is an image from real-world and labels in the upper right with red color and "1" are the given labels, and labels in the lower right with gray color and "?" are the missing labels. Note that all the seven labels are in the ground truth.
  • Figure 2: A taxonomy of incomplete multi-label learning.
  • Figure 3: Applications of incomplete multi-label learning.