Aligning Multiclass Neural Network Classifier Criterion with Task Performance Metrics
Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez, Nathan Tsoi
TL;DR
This work addresses the misalignment between training objectives and evaluation metrics in multiclass classification by introducing EAST, a differentiable surrogate framework for confusion-matrix based metrics. EAST combines dynamic thresholding, a multiclass soft-set confusion matrix, and an annealing schedule to progressively align training losses with the target metric such as macro $F_β$-Score or MCC. The approach is theoretically grounded, proving consistency and convergence of the surrogate to the true metric under mild conditions, and empirically demonstrates improved metric alignment across diverse datasets, particularly in non-extreme imbalance settings. The method enables end-to-end optimization for task-specific evaluation criteria, offering practical benefits for scenarios with class imbalance and domain-specific performance priorities.
Abstract
Multiclass neural network classifiers are typically trained using cross-entropy loss but evaluated using metrics derived from the confusion matrix, such as Accuracy, $F_β$-Score, and Matthews Correlation Coefficient. This mismatch between the training objective and evaluation metric can lead to suboptimal performance, particularly when the user's priorities differ from what cross-entropy implicitly optimizes. For example, in the presence of class imbalance, $F_1$-Score may be preferred over Accuracy. Similarly, given a preference towards precision, the $F_{β=0.25}$-Score will better reflect this preference than $F_1$-Score. However, standard cross-entropy loss does not accommodate such a preference. Building on prior work leveraging soft-set confusion matrices and a continuous piecewise-linear Heaviside approximation, we propose Evaluation Aligned Surrogate Training (EAST), a novel approach to train multiclass classifiers using close surrogates of confusion-matrix based metrics, thereby aligning a neural network classifier's predictions more closely to a target evaluation metric than typical cross-entropy loss. EAST introduces three key innovations: First, we propose a novel dynamic thresholding approach during training. Second, we propose using a multiclass soft-set confusion matrix. Third, we introduce an annealing process that gradually aligns the surrogate loss with the target evaluation metric. Our theoretical analysis shows that EAST results in consistent estimators of the target evaluation metric. Furthermore, we show that the learned network parameters converge asymptotically to values that optimize for the target evaluation metric. Extensive experiments validate the effectiveness of our approach, demonstrating improved alignment between training objectives and evaluation metrics, while outperforming existing methods across many datasets.
