Selecting a classification performance measure: matching the measure to the problem
David J. Hand, Peter Christen, Sumayya Ziyad
TL;DR
The paper argues that choosing a classifier performance measure must reflect the specific aims and constraints of the problem rather than defaulting to common metrics. It distinguishes structural properties of measures from problem-aim–driven properties, and defines a framework to assess measures through a confusion-matrix lens. A wide range of crisp binary measures is catalogued with their definitions and properties, and the authors advocate tailoring measure choice to the task, including handling unknown class distributions and avoiding misplaced reliance on threshold-averaged metrics. The work emphasizes practical guidance for researchers to articulate aims, constraints, and justification when evaluating classification methods, aiming to reduce misinterpretation and misapplication across domains.
Abstract
The problem of identifying to which of a given set of classes objects belong is ubiquitous, occurring in many research domains and application areas, including medical diagnosis, financial decision making, online commerce, and national security. But such assignments are rarely completely perfect, and classification errors occur. This means it is necessary to compare classification methods and algorithms to decide which is ``best'' for any particular problem. However, just as there are many different classification methods, so there are many different ways of measuring their performance. It is thus vital to choose a measure of performance which matches the aims of the research or application. This paper is a contribution to the growing literature on the relative merits of different performance measures. Its particular focus is the critical importance of matching the properties of the measure to the aims for which the classification is being made.
