Revisiting Hierarchical Text Classification: Inference and Metrics

Roman Plaud; Matthieu Labeau; Antoine Saillenfest; Thomas Bonald

Revisiting Hierarchical Text Classification: Inference and Metrics

Roman Plaud, Matthieu Labeau, Antoine Saillenfest, Thomas Bonald

TL;DR

This work proposes to evaluate models based on specifically designed hierarchical metrics and demonstrates the intricacy of metric choice and prediction inference method.

Abstract

Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC. Code implementation and dataset are available at \url{https://github.com/RomanPlaud/revisitingHTC}.

Revisiting Hierarchical Text Classification: Inference and Metrics

TL;DR

This work proposes to evaluate models based on specifically designed hierarchical metrics and demonstrates the intricacy of metric choice and prediction inference method.

Abstract

Paper Structure (41 sections, 4 theorems, 50 equations, 8 figures, 8 tables)

This paper contains 41 sections, 4 theorems, 50 equations, 8 figures, 8 tables.

Introduction
Related Work
Hierarchical Text Classification
Hierarchical prediction
Hierarchical classification evaluation
Evaluation metrics
Multi-label metrics
Hierarchical metrics
Hierarchical F1-score.
Other hierarchical metrics.
Inference methodology
Simple conditional loss-based methods
Conditional softmax cross-entropy
Logit-adjusted conditional softmax
Conditional sigmoid binary cross-entropy
...and 26 more sections

Key Result

Proposition 1

In micro and samples settings, if every prediction $\hat{Y}$ is coherent, then hF1 and F1 are strictly equal.

Figures (8)

Figure 1: Extract of the taxonomy of our new dataset Hierarchical WikiVitals. Each colored path is the set of labels of the same color.
Figure 2: Example of a conditional distribution estimation over a simple hierarchy and corresponding predicted nodes (in blue) for different thresholds ($0.3$on the left, $0.5$on the right).
Figure 3: Averaged Macro F1-Scores on the test set per depth for different models and for the HWV dataset. The error bars represent a $95\%$ confidence interval.
Figure 4: Averaged Macro F1-Scores on the test set by quantiles of label counts distribution in the training set for different models and for the HWV dataset. The shaded regions represent a $95\%$ confidence interval.
Figure 5: Number of nodes per depth for HWV dataset. Hatched histogram correspond to leaf nodes.
...and 3 more figures

Theorems & Definitions (4)

Proposition 1
Proposition 2
Proposition 3
Proposition 4

Revisiting Hierarchical Text Classification: Inference and Metrics

TL;DR

Abstract

Revisiting Hierarchical Text Classification: Inference and Metrics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)