Table of Contents
Fetching ...

libcll: an Extendable Python Toolkit for Complementary-Label Learning

Nai-Xuan Ye, Tan-Ha Mai, Hsiu-Hsuan Wang, Wei-I Lin, Hsuan-Tien Lin

TL;DR

The introduced \texttt{libcll}, an extensible Python toolkit for CLL research, provides a universal interface that supports a wide range of generation assumptions, both synthetic and real-world datasets, and key CLL algorithms.

Abstract

Complementary-label learning (CLL) is a weakly supervised learning paradigm for multiclass classification, where only complementary labels -- indicating classes an instance does not belong to -- are provided to the learning algorithm. Despite CLL's increasing popularity, previous studies highlight two main challenges: (1) inconsistent results arising from varied assumptions on complementary label generation, and (2) high barriers to entry due to the lack of a standardized evaluation platform across datasets and algorithms. To address these challenges, we introduce \texttt{libcll}, an extensible Python toolkit for CLL research. \texttt{libcll} provides a universal interface that supports a wide range of generation assumptions, both synthetic and real-world datasets, and key CLL algorithms. The toolkit is designed to mitigate inconsistencies and streamline the research process, with easy installation, comprehensive usage guides, and quickstart tutorials that facilitate efficient adoption and implementation of CLL techniques. Extensive ablation studies conducted with \texttt{libcll} demonstrate its utility in generating valuable insights to advance future CLL research.

libcll: an Extendable Python Toolkit for Complementary-Label Learning

TL;DR

The introduced \texttt{libcll}, an extensible Python toolkit for CLL research, provides a universal interface that supports a wide range of generation assumptions, both synthetic and real-world datasets, and key CLL algorithms.

Abstract

Complementary-label learning (CLL) is a weakly supervised learning paradigm for multiclass classification, where only complementary labels -- indicating classes an instance does not belong to -- are provided to the learning algorithm. Despite CLL's increasing popularity, previous studies highlight two main challenges: (1) inconsistent results arising from varied assumptions on complementary label generation, and (2) high barriers to entry due to the lack of a standardized evaluation platform across datasets and algorithms. To address these challenges, we introduce \texttt{libcll}, an extensible Python toolkit for CLL research. \texttt{libcll} provides a universal interface that supports a wide range of generation assumptions, both synthetic and real-world datasets, and key CLL algorithms. The toolkit is designed to mitigate inconsistencies and streamline the research process, with easy installation, comprehensive usage guides, and quickstart tutorials that facilitate efficient adoption and implementation of CLL techniques. Extensive ablation studies conducted with \texttt{libcll} demonstrate its utility in generating valuable insights to advance future CLL research.

Paper Structure

This paper contains 15 sections, 1 equation, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Coverage of the libcll Toolkit: This figure provides an overview of the key components included in the libcll toolkit, which encompasses 15 datasets spanning synthetic and real-world scenarios, 5 commonly used models in Complementary Label Learning (CLL), 4 CLL assumptions, and 14 CLL algorithms. To the best of our knowledge, libcll is the first comprehensive toolkit specifically dedicated to CLL.
  • Figure 2: The development of complementary-label learning.
  • Figure 3: Training pipeline of libcll. The process begins by initializing the dataset, either real-world or synthetic. For real-world datasets, the pipeline directly proceeds to the calculation of the true transition matrix. If synthetic datasets are chosen, complementary labels (CLs) are generated based on user-specified parameters, such as the number of CLs per instance and their distribution (uniform, biased or noisy). The preprocessor then calculates the transition matrix based on the true and corresponding CLs for each instance. Afterward, the data preprocessor prepares the DataLoader along with the calculated transition matrix. The training process is initiated using the selected strategy module, the transition matrix, and the DataLoader.