Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

Deyuan Li; Taesoo Daniel Lee; Marynel Vázquez; Nathan Tsoi

Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez, Nathan Tsoi

TL;DR

This work addresses the misalignment between training objectives and evaluation metrics in multiclass classification by introducing EAST, a differentiable surrogate framework for confusion-matrix based metrics. EAST combines dynamic thresholding, a multiclass soft-set confusion matrix, and an annealing schedule to progressively align training losses with the target metric such as macro $F_β$-Score or MCC. The approach is theoretically grounded, proving consistency and convergence of the surrogate to the true metric under mild conditions, and empirically demonstrates improved metric alignment across diverse datasets, particularly in non-extreme imbalance settings. The method enables end-to-end optimization for task-specific evaluation criteria, offering practical benefits for scenarios with class imbalance and domain-specific performance priorities.

Abstract

Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_β$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitly optimizes. For example, in the presence of class imbalance, $F_1$-Score may be preferred over Accuracy. Similarly, given a preference towards precision, the $F_{β=0.25}$-Score will better reflect this preference than $F_1$-Score. However, standard cross-entropy loss does not accommodate such a preference. Building on prior work leveraging soft-set confusion matrices and a continuous piecewise-linear Heaviside approximation, we propose Evaluation Aligned Surrogate Training (EAST), a novel approach to train multiclass classifiers using close surrogates of confusion-matrix based metrics, thereby aligning a neural network classifier's predictions more closely to a target evaluation metric than typical cross-entropy loss. EAST introduces three key innovations: First, we propose a novel dynamic thresholding approach during training. Second, we propose using a multiclass soft-set confusion matrix. Third, we introduce an annealing process that gradually aligns the surrogate loss with the target evaluation metric. Our theoretical analysis shows that EAST results in consistent estimators of the target evaluation metric. Furthermore, we show that the learned network parameters converge asymptotically to values that optimize for the target evaluation metric. Extensive experiments validate the effectiveness of our approach, demonstrating improved alignment between training objectives and evaluation metrics, while outperforming existing methods across many datasets.

Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

TL;DR

-Score or MCC. The approach is theoretically grounded, proving consistency and convergence of the surrogate to the true metric under mild conditions, and empirically demonstrates improved metric alignment across diverse datasets, particularly in non-extreme imbalance settings. The method enables end-to-end optimization for task-specific evaluation criteria, offering practical benefits for scenarios with class imbalance and domain-specific performance priorities.

Abstract

Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy,

-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitly optimizes. For example, in the presence of class imbalance,

-Score may be preferred over Accuracy. Similarly, given a preference towards precision, the

-Score will better reflect this preference than

-Score. However, standard cross-entropy loss does not accommodate such a preference. Building on prior work leveraging soft-set confusion matrices and a continuous piecewise-linear Heaviside approximation, we propose Evaluation Aligned Surrogate Training (EAST), a novel approach to train multiclass classifiers using close surrogates of confusion-matrix based metrics, thereby aligning a neural network classifier's predictions more closely to a target evaluation metric than typical cross-entropy loss. EAST introduces three key innovations: First, we propose a novel dynamic thresholding approach during training. Second, we propose using a multiclass soft-set confusion matrix. Third, we introduce an annealing process that gradually aligns the surrogate loss with the target evaluation metric. Our theoretical analysis shows that EAST results in consistent estimators of the target evaluation metric. Furthermore, we show that the learned network parameters converge asymptotically to values that optimize for the target evaluation metric. Extensive experiments validate the effectiveness of our approach, demonstrating improved alignment between training objectives and evaluation metrics, while outperforming existing methods across many datasets.

Paper Structure (34 sections, 9 theorems, 46 equations, 7 tables)

This paper contains 34 sections, 9 theorems, 46 equations, 7 tables.

Introduction
Related work
Preliminaries
Method
Dynamic Thresholding
Multiclass Soft-Set Confusion Matrix
Annealing Process
Theoretical Grounding
Annealing Convergence Results
Guarantees on Dataset and Batch Size
Experiments
Results
Limitations and Conclusion
Acknowledgements
Theoretical Grounding
...and 19 more sections

Key Result

Proposition 5.2

For any $d$-dimensional probability distribution $\mathbf p$,

Theorems & Definitions (18)

Definition 5.1
Proposition 5.2
proof
Theorem 5.3
proof
Theorem 5.4
proof
Definition 5.5
Definition 5.6
Theorem 5.7
...and 8 more

Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

TL;DR

Abstract

Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (18)