Consistent algorithms for multi-label classification with macro-at-$k$ metrics

Erik Schultheis; Wojciech Kotłowski; Marek Wydmuch; Rohit Babbar; Strom Borman; Krzysztof Dembczyński

Consistent algorithms for multi-label classification with macro-at-$k$ metrics

Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński

TL;DR

The paper addresses optimizing complex macro-at-$k$ metrics in multi-label classification under budgeted predictions. It shows that for linear macro-utilities the optimal rule reduces to top-$k$ labels after an affine transform of label marginals, and it develops a Frank-Wolfe–based, statistically consistent learning algorithm that extends to nonlinear metrics via gradient-based linearization. Theoretical contributions establish the existence and form of the optimal confusion tensor and provide convergence guarantees for the proposed algorithm, including a regret bound that accounts for marginal-estimation error. Empirically, the approach yields competitive macro-measures on extreme multi-label benchmarks and scales to thousands of labels, with practical considerations like sparse marginals and tail-label sensitivity highlighted. Overall, the work provides a principled, scalable framework for consistent optimization of complex macro-at-$k$ metrics in budgeted multi-label problems.

Abstract

We consider the optimization of complex performance metrics in multi-label classification under the population utility framework. We mainly focus on metrics linearly decomposable into a sum of binary classification utilities applied separately to each label with an additional requirement of exactly $k$ labels predicted for each instance. These "macro-at-$k$" metrics possess desired properties for extreme classification problems with long tail labels. Unfortunately, the at-$k$ constraint couples the otherwise independent binary classification tasks, leading to a much more challenging optimization problem than standard macro-averages. We provide a statistical framework to study this problem, prove the existence and the form of the optimal classifier, and propose a statistically consistent and practical learning algorithm based on the Frank-Wolfe method. Interestingly, our main results concern even more general metrics being non-linear functions of label-wise confusion matrices. Empirical results provide evidence for the competitive performance of the proposed approach.

Consistent algorithms for multi-label classification with macro-at-$k$ metrics

TL;DR

The paper addresses optimizing complex macro-at-

metrics in multi-label classification under budgeted predictions. It shows that for linear macro-utilities the optimal rule reduces to top-

labels after an affine transform of label marginals, and it develops a Frank-Wolfe–based, statistically consistent learning algorithm that extends to nonlinear metrics via gradient-based linearization. Theoretical contributions establish the existence and form of the optimal confusion tensor and provide convergence guarantees for the proposed algorithm, including a regret bound that accounts for marginal-estimation error. Empirically, the approach yields competitive macro-measures on extreme multi-label benchmarks and scales to thousands of labels, with practical considerations like sparse marginals and tail-label sensitivity highlighted. Overall, the work provides a principled, scalable framework for consistent optimization of complex macro-at-

metrics in budgeted multi-label problems.

Abstract

labels predicted for each instance. These "macro-at-

" metrics possess desired properties for extreme classification problems with long tail labels. Unfortunately, the at-

constraint couples the otherwise independent binary classification tasks, leading to a much more challenging optimization problem than standard macro-averages. We provide a statistical framework to study this problem, prove the existence and the form of the optimal classifier, and propose a statistically consistent and practical learning algorithm based on the Frank-Wolfe method. Interestingly, our main results concern even more general metrics being non-linear functions of label-wise confusion matrices. Empirical results provide evidence for the competitive performance of the proposed approach.

Paper Structure (26 sections, 21 theorems, 81 equations, 1 figure, 8 tables, 1 algorithm)

This paper contains 26 sections, 21 theorems, 81 equations, 1 figure, 8 tables, 1 algorithm.

Introduction
Related work
Problem statement
The optimal classifier
Consistent algorithms
Experiments
Conclusions
Madow's sampling scheme
The optimal classifier for linear metrics
The optimal classifier for general metrics
Consistency of Frank-Wolfe
VC-dimension lemma
Additional lemmas
Bound for Linear Optimization Step
Consistency of fixed-step-schedule Frank-Wolfe
...and 11 more sections

Key Result

theorem 4.1

The optimal classifier $\optimal\hypothesis \coloneqq \argmax_{\hypothesis \in \hypothesisspace} \taskloss(\hypothesis)$ for $\taskloss(\hypothesis) = \gaintensor \cdot \confusiontensor(\hypothesis)$ is given by where $\odot$ denotes the coordinate-wise product of vectors, while the vectors $\gainslope$ and $\gainintercept$ are given by: and $\operatorname{top}_{k}(\examplevec)$ returns a $k$-ho

Figures (1)

Figure 1: Comparison of the baseline algorithms with the PU inference with mixed objectives for $k \in \{3, 5, 10\}$. The green line shows the results for different interpolations between two measures.

Theorems & Definitions (45)

Definition 3.0: Binary Confusion Matrix Measure
Definition 3.0: Confusion Tensor Measure
theorem 4.1
proof : Proof (sketch, full proof in Appendix \ref{['app:linear_metric']})
theorem 4.4
proof : Proof (sketch, full proof in Appendix \ref{['app:the_optimal_classifier']}
theorem 5.1: Consistency of Frank-Wolfe
lemma 5.1: VC dimension for linear top-k classifiers
theorem A.1
proof
...and 35 more

Consistent algorithms for multi-label classification with macro-at-$k$ metrics

TL;DR

Abstract

Consistent algorithms for multi-label classification with macro-at-$k$ metrics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (45)