Minimal Learning Machine for Multi-Label Learning

Joonas Hämäläinen; Antoine Hubermont; Amauri Souza; César L. C. Mattos; João P. P. Gomes; Tommi Kärkkäinen

Minimal Learning Machine for Multi-Label Learning

Joonas Hämäläinen, Antoine Hubermont, Amauri Souza, César L. C. Mattos, João P. P. Gomes, Tommi Kärkkäinen

TL;DR

The paper tackles multi-label classification by reframing learning as a distance-regression problem that maps input-space distances to output-space distances. It introduces ML-MLM, which combines the MLM distance regression with inverse distance weighting to produce deterministic, ranking-friendly predictions, with hyper-parameters selected via a closed-form LOOCV-based ranking loss. Empirical results on ten datasets show ML-MLM achieves competitive ranking performance relative to state-of-the-art methods and favorable bipartition metrics, while offering interpretable uncertainty estimates through predicted distances. The approach provides a lightweight, parameter-efficient alternative for small-to-moderate MLC tasks and outlines practical considerations for uncertainty, thresholding, and complexity, along with avenues for extensions such as ensembles and feature selection.

Abstract

Distance-based supervised method, the minimal learning machine, constructs a predictive model from data by learning a mapping between input and output distance matrices. In this paper, we propose new methods and evaluate how their core component, the distance mapping, can be adapted to multi-label learning. The proposed approach is based on combining the distance mapping with an inverse distance weighting. Although the proposal is one of the simplest methods in the multi-label learning literature, it achieves state-of-the-art performance for small to moderate-sized multi-label learning problems. In addition to its simplicity, the proposed method is fully deterministic: Its hyper-parameter can be selected via ranking loss-based statistic which has a closed form, thus avoiding conventional cross-validation-based hyper-parameter tuning. In addition, due to its simple linear distance mapping-based construction, we demonstrate that the proposed method can assess the uncertainty of the predictions for multi-label classification, which is a valuable capability for data-centric machine learning pipelines.

Minimal Learning Machine for Multi-Label Learning

TL;DR

Abstract

Paper Structure (23 sections, 1 theorem, 26 equations, 5 figures, 10 tables, 2 algorithms)

This paper contains 23 sections, 1 theorem, 26 equations, 5 figures, 10 tables, 2 algorithms.

Introduction
Background
A brief review of multi-label classification
Distance regression and distance weighting schemes
Summary of empirical results
Multi-label minimal learning machine
Basic formulation with an approximation result
Multi-label algorithm
Model selection
Experiments and Results
Experimental Setup
Results
Ranking-based metrics
Bipartition-based metrics
Assessing uncertainty and interpreting ML-MLM
...and 8 more sections

Key Result

Proposition 1

Let $L \geq 2$ denote the number of classes in an MLC problem with $\mathbb{Y}=\{0,1\}^L$. Also, assume that the set of output reference points $\mathcal{T}$ contains all possible multi-label assignments. Then, the following holds:

Figures (5)

Figure 1: The main components of the ML-MLM's prediction in action are illustrated for a model trained on a toy dataset. The dataset consists of four instances associated with four unique label sets. A distance regression model $\mathbf{B}$ has been trained with four input space reference points $\{\mathbf{r}_i\}_{i=1}^4$ with corresponding label space vectors $\{\mathbf{y}_i\}_{i=1}^4$ (treated as output space reference points regarding the MLM context). The IDW weighting is computed with $P = 2$. The sample images here were created by Stable Diffusion.
Figure 2: Results for ranking-based metrics.
Figure 3: Results for bipartition-based metrics.
Figure 4: Distance regression prediction's minimum distances for the test sets. Predicted distances closer to zero reflect that the distance regression model is interpolating more. Distance regression model's overall interpolation ability is strongest for Tmc2007 and worst for Delicious.
Figure 5: (a) shows how weight values changes as a function MLM's predicted distances with different power parameter values. (b) shows how Ranking Loss changes as a function of power parameter $P$ for Yeast dataset.

Theorems & Definitions (3)

Proposition 1: Nearest neighbor MLM for multi-label learning
proof
proof

Minimal Learning Machine for Multi-Label Learning

TL;DR

Abstract

Minimal Learning Machine for Multi-Label Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (3)