Multiclass threshold-based classification and model evaluation

Edoardo Legnaro; Sabrina Guastavino; Francesco Marchetti

Multiclass threshold-based classification and model evaluation

Edoardo Legnaro, Sabrina Guastavino, Francesco Marchetti

TL;DR

The paper generalizes multiclass classification by replacing the argmax rule with a threshold-based decision framework on the (m-1)-simplex, enabling a posteriori optimization of prediction scores without retraining. It defines a simplex-based classification collection and a multidimensional threshold, derives a tuning algorithm, and develops ROC clouds with a novel DFP metric to evaluate joint multiclass trade-offs. Empirical results across diverse datasets show that simplex threshold tuning can improve accuracy and macro-F1, particularly in unbalanced settings, and the ROC-cloud framework provides a coherent, multiclass alternative to OvR analyses. The work also discusses practical considerations, including computational cost and strategies to mitigate the curse of dimensionality, highlighting potential for broad adoption in multiclass problems.

Abstract

In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional simplex, where the classification depends on a multidimensional threshold. This change of perspective enables for any trained classification network an \textit{a posteriori} optimization of the classification score by means of threshold tuning, as usually carried out in the binary setting, thus allowing for a further refinement of the prediction capability of any network. Our experiments show indeed that multidimensional threshold tuning yields performance improvements across various networks and datasets. Moreover, we derive a multiclass ROC analysis based on \emph{ROC clouds} -- the attainable (FPR,TPR) operating points induced by a single multiclass threshold -- and summarize them via a \emph{Distance From Point} (DFP) score to $(0,1)$. This yields a coherent alternative to standard One-vs-Rest (OvR) curves and aligns with the observed tuning gains.

Multiclass threshold-based classification and model evaluation

TL;DR

Abstract

Multiclass threshold-based classification and model evaluation

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (4)