Table of Contents
Fetching ...

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

David M. W. Powers

TL;DR

E elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision are demonstrated.

Abstract

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

TL;DR

E elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision are demonstrated.

Abstract

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

Paper Structure

This paper contains 25 sections, 21 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Systematic and traditional notations in a binary contingency table. Shading indicates correct (light=green) and incorrect (dark=red) rates or counts in the contingency table.
  • Figure 2: Illustration of ROC Analysis. The main diagonal represents chance with parallel isocost lines representing equal cost-performance. Points above the diagonal represent performance better than chance, those below worse than chance. For a single good (dotted=green) system, AUC is area under curve (trapezoid between green line and$\mathrm{x}=[0,1]$ ). The perverse (dashed=red) system shown is the same (good) system with class labels reversed.
  • Figure 3: Accuracy of traditional measures. 110 Monte Carlo simulations with 11 stepped expected Informedness levels (red) with Bookmakerestimated Informedness (red dot), Markedness (green dot) and Correlation (blue dot), and showing (dashed) Kappa versus the biased traditional measures Rank Weighted Average (Wav), Geometric Mean (Gav) and Harmonic Mean F1 (Fav). The Determinant (D) and Evenness k -th roots ($\mathrm{gR}=$ PrevG and $\mathrm{gP}=$ BiasP ) are shown $+1 . \mathrm{K}=4, \mathrm{~N}=128$. (Online version has figures in colour.)
  • Figure 4: Chi-squared against degrees of freedom cumulative density isocontours
  • Figure 5: Illustration of significance and Cramer's V. 110 Monte Carlo simulations with 11 stepped expected Informedness (red) levels with Bookmakerestimated Informedness (red dots), Markedness (green dot) and Correlation (blue dot), with significance (p+1) calculated using$\mathrm{G}^{2}, \mathrm{X}^{2}$, and Fisher estimates, and (skewed) Cramer's V Correlation estimates calculated from both $\mathrm{G}^{2}$ and $\mathrm{X}^{2}$. Here $\mathrm{K}=4, \mathrm{~N}=128, \mathrm{X}=1.96$, $\alpha=\beta=0.05 .$
  • ...and 2 more figures