Table of Contents
Fetching ...

Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values for Multi-classifiers

Yanyun Wang, Dehui Du, Yuanhao Liu

TL;DR

The paper addresses the challenge of evaluating and training multi-class classifiers in safety-critical domains under human values such as prioritizing certain error types and tolerating controlled concessions in confidence. It introduces Meta Pattern Concern Score (MPCS), which builds meta-pattern representations from top-$k$ predictions and discretized confidences, and incorporates a release list and release factor to encode per-error concerns, with an adjustable concession mechanism that links to a final score $S$ that generalizes cross-entropy as $k=1$ and $t o ext{infty}$. MPCS supports both evaluation and learning by enabling human-value-driven model selection and adaptive learning-rate strategies, and is shown to align with standard metrics (Spearman > 0.9) while maintaining practical efficiency. A MNIST case demonstrates tangible benefits: MPCS can reduce dangerous misclassifications by $0.53 ext{%}$ at a $0.04 ext{%-point}$ loss in training accuracy, and learning-rate adaptation driven by MPCS yields a $1.62 ext{%}$ lower MPCS with $0.36 ext{%}$ fewer dangerous cases; code is provided for reproducibility.

Abstract

While advanced classifiers have been increasingly used in real-world safety-critical applications, how to properly evaluate the black-box models given specific human values remains a concern in the community. Such human values include punishing error cases of different severity in varying degrees and making compromises in general performance to reduce specific dangerous cases. In this paper, we propose a novel evaluation measure named Meta Pattern Concern Score based on the abstract representation of probabilistic prediction and the adjustable threshold for the concession in prediction confidence, to introduce the human values into multi-classifiers. Technically, we learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values, so that our measure is effective as them even under general tasks, and the cross entropy loss becomes a special case of our measure in the limit. Besides, our measure can also be used to refine the model training by dynamically adjusting the learning rate. The experiments on four kinds of models and six datasets confirm the effectiveness and efficiency of our measure. And a case study shows it can not only find the ideal model reducing 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy, but also refine the learning rate to train a new model averagely outperforming the original one with a 1.62% lower value of itself and 0.36% fewer number of dangerous cases.

Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values for Multi-classifiers

TL;DR

The paper addresses the challenge of evaluating and training multi-class classifiers in safety-critical domains under human values such as prioritizing certain error types and tolerating controlled concessions in confidence. It introduces Meta Pattern Concern Score (MPCS), which builds meta-pattern representations from top- predictions and discretized confidences, and incorporates a release list and release factor to encode per-error concerns, with an adjustable concession mechanism that links to a final score that generalizes cross-entropy as and . MPCS supports both evaluation and learning by enabling human-value-driven model selection and adaptive learning-rate strategies, and is shown to align with standard metrics (Spearman > 0.9) while maintaining practical efficiency. A MNIST case demonstrates tangible benefits: MPCS can reduce dangerous misclassifications by at a loss in training accuracy, and learning-rate adaptation driven by MPCS yields a lower MPCS with fewer dangerous cases; code is provided for reproducibility.

Abstract

While advanced classifiers have been increasingly used in real-world safety-critical applications, how to properly evaluate the black-box models given specific human values remains a concern in the community. Such human values include punishing error cases of different severity in varying degrees and making compromises in general performance to reduce specific dangerous cases. In this paper, we propose a novel evaluation measure named Meta Pattern Concern Score based on the abstract representation of probabilistic prediction and the adjustable threshold for the concession in prediction confidence, to introduce the human values into multi-classifiers. Technically, we learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values, so that our measure is effective as them even under general tasks, and the cross entropy loss becomes a special case of our measure in the limit. Besides, our measure can also be used to refine the model training by dynamically adjusting the learning rate. The experiments on four kinds of models and six datasets confirm the effectiveness and efficiency of our measure. And a case study shows it can not only find the ideal model reducing 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy, but also refine the learning rate to train a new model averagely outperforming the original one with a 1.62% lower value of itself and 0.36% fewer number of dangerous cases.
Paper Structure (17 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The figure shows the simplified traffic light classification problem. The values in the figure illustrate how the three different predictions (i.e. predicting "Red" respectively as "Red", "Yellow" and "Green") contribute to the calculation of different metrics. This corresponds to a concept to be formally defined later named concern degree. The "Others" includes the MS and all the evaluation measures from the confusion matrix.
  • Figure 2: The figure illustrates our idea to set a threshold for prediction confidence (e.g. 1 - 0.02 = 0.98) to indicate the concession we can tolerate, and divide the confidence field into different intervals to extend the concession interval by interval, to accordingly calculate the specific punishment values.
  • Figure 3: The figure shows the calculation process of MPCS. The meta pattern includes the prediction pattern and the confidence pattern constructed from probabilistic predictions of the target multi-classifier. Given specific human values, the concern degrees and interval punishments are calculated upon meta pattern, and they are then used to calculate the final MPCS value.
  • Figure 4: Corresponding to Table \ref{['tab:Similarity']}, the figures show the trend of MPCS and the benchmark metrics in the ten different training processes, which also intuitively show their similarity in general practice. Notice that for better visual comparison between them, the specific values of MS, CE and MPCS are normalized.
  • Figure 5: The figures show the confusion matrices of the prediction results of the two multi-classifiers respectively picked out by the benchmark metrics (they all pick the same model in this cases) and MPCS given specific human values (the wrong predictions framed in red are less destructive according to the t-SNE van2008visualizing). The model picked by MPCS reduces 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy.
  • ...and 1 more figures