Table of Contents
Fetching ...

Multiclass threshold-based classification and model evaluation

Edoardo Legnaro, Sabrina Guastavino, Francesco Marchetti

TL;DR

The paper generalizes multiclass classification by replacing the argmax rule with a threshold-based decision framework on the (m-1)-simplex, enabling a posteriori optimization of prediction scores without retraining. It defines a simplex-based classification collection and a multidimensional threshold, derives a tuning algorithm, and develops ROC clouds with a novel DFP metric to evaluate joint multiclass trade-offs. Empirical results across diverse datasets show that simplex threshold tuning can improve accuracy and macro-F1, particularly in unbalanced settings, and the ROC-cloud framework provides a coherent, multiclass alternative to OvR analyses. The work also discusses practical considerations, including computational cost and strategies to mitigate the curse of dimensionality, highlighting potential for broad adoption in multiclass problems.

Abstract

In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional simplex, where the classification depends on a multidimensional threshold. This change of perspective enables for any trained classification network an \textit{a posteriori} optimization of the classification score by means of threshold tuning, as usually carried out in the binary setting, thus allowing for a further refinement of the prediction capability of any network. Our experiments show indeed that multidimensional threshold tuning yields performance improvements across various networks and datasets. Moreover, we derive a multiclass ROC analysis based on \emph{ROC clouds} -- the attainable (FPR,TPR) operating points induced by a single multiclass threshold -- and summarize them via a \emph{Distance From Point} (DFP) score to $(0,1)$. This yields a coherent alternative to standard One-vs-Rest (OvR) curves and aligns with the observed tuning gains.

Multiclass threshold-based classification and model evaluation

TL;DR

The paper generalizes multiclass classification by replacing the argmax rule with a threshold-based decision framework on the (m-1)-simplex, enabling a posteriori optimization of prediction scores without retraining. It defines a simplex-based classification collection and a multidimensional threshold, derives a tuning algorithm, and develops ROC clouds with a novel DFP metric to evaluate joint multiclass trade-offs. Empirical results across diverse datasets show that simplex threshold tuning can improve accuracy and macro-F1, particularly in unbalanced settings, and the ROC-cloud framework provides a coherent, multiclass alternative to OvR analyses. The work also discusses practical considerations, including computational cost and strategies to mitigate the curse of dimensionality, highlighting potential for broad adoption in multiclass problems.

Abstract

In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of softmax outputs with a geometric one on the multidimensional simplex, where the classification depends on a multidimensional threshold. This change of perspective enables for any trained classification network an \textit{a posteriori} optimization of the classification score by means of threshold tuning, as usually carried out in the binary setting, thus allowing for a further refinement of the prediction capability of any network. Our experiments show indeed that multidimensional threshold tuning yields performance improvements across various networks and datasets. Moreover, we derive a multiclass ROC analysis based on \emph{ROC clouds} -- the attainable (FPR,TPR) operating points induced by a single multiclass threshold -- and summarize them via a \emph{Distance From Point} (DFP) score to . This yields a coherent alternative to standard One-vs-Rest (OvR) curves and aligns with the observed tuning gains.

Paper Structure

This paper contains 12 sections, 1 theorem, 21 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

The simplex classification collection defined in eq:natural is proper and generalizes the argmax procedure, i.e., the classical argmax is a particular case of eq:natural.

Figures (4)

  • Figure 1: For $m=3$, the three regions $R_1(\boldsymbol{\tau})$ (red), $R_2(\boldsymbol{\tau})$ (green) and $R_3(\boldsymbol{\tau})$ (blue). From left to right: $\boldsymbol{\tau}=(1/3,1/3,1/3)$, $\boldsymbol{\tau}=(1/2,1/3,1/6)$, $\boldsymbol{\tau}=(1/8,3/4,1/8)$ (big black dot). The color blue, red or green represents the true label of the samples (colored dots). Number of misclassifications from left to right: $7$, $8$ and $10$.
  • Figure 2: For $m=3$, the three regions $\bar{R}_1(\boldsymbol{\tau})$ (red), $\bar{R}_2(\boldsymbol{\tau})$ (green) and $\bar{R}_3(\boldsymbol{\tau})$ (blue). From left to right: $\boldsymbol{\tau}=(1/3,1/3,1/3)$, $\boldsymbol{\tau}=(1/2,1/3,1/6)$, $\boldsymbol{\tau}=(1/8,3/4,1/8)$ (big black dot). Differently with respect to \ref{['fig:simplices']}, here each region is unrelated to the others and overlapping is allowed.
  • Figure 3: Accuracy (left) and Macro F1 Score (right) heatmaps across the simplex for the validation set. The color intensity represents the metric's value, with lighter shades indicating higher performance. The standard threshold $\boldsymbol{\tau} = (1/3, 1/3, 1/3)$ is marked with a red circle, while the threshold giving the best score is indicated by a green star, corresponding to $\boldsymbol{\tau}^\star = (0.00, 0.06, 0.94)$ for accuracy and $\boldsymbol{\tau}^\star = (0.01, 0.08, 0.91)$ for the Macro F1 Score. Here, the grid of evaluation thresholds consists of $M=20301$ samples.
  • Figure 4: ROC clouds with OvR ROC curves (orange). Top: SOLAR-STORM1. Bottom: OCTMNIST (validation set).

Theorems & Definitions (4)

  • Definition 1: Simplex classification collection
  • Definition 2: Proper simplex classification collection
  • Proposition 1
  • proof