Hierarchical confusion matrix for classification performance evaluation
Kevin Riehl, Michael Neunteufel, Martin Hemberg
TL;DR
The paper introduces a hierarchical confusion matrix to adapt traditional binary evaluation metrics to hierarchical classification, enabling evaluation that respects hierarchical structure and path-level correctness. It generalizes the approach to trees and DAGs, and to single-path and multi-path labeling with optional leaf-node prediction, unifying evaluation across diverse hierarchical problems. Through three real-world benchmarks, the authors show that hierarchical-confusion-based measures yield meaningful, interpretable rankings and reveal differences from conventional metrics in structure-sensitive settings. The work provides a practical, open-source implementation to facilitate standardized evaluation of hierarchical classifiers across domains.
Abstract
In this work we propose a novel concept of a hierarchical confusion matrix, opening the door for popular confusion matrix based (flat) evaluation measures from binary classification problems, while considering the peculiarities of hierarchical classification problems. We develop the concept to a generalized form and prove its applicability to all types of hierarchical classification problems including directed acyclic graphs, multi path labelling, and non mandatory leaf node prediction. Finally, we use measures based on the novel confusion matrix to evaluate models within a benchmark for three real world hierarchical classification applications and compare the results to established evaluation measures. The results outline the reasonability of this approach and its usefulness to evaluate hierarchical classification problems. The implementation of hierarchical confusion matrix is available on GitHub.
