HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification
He Zhu, Junran Wu, Ruomei Liu, Yue Hou, Ze Yuan, Shangzhe Li, Yicheng Pan, Ke Xu
TL;DR
The paper tackles hierarchical text classification (HTC) with self-supervised contrastive learning, identifying that input augmentation can distort semantic content. It introduces HILL, a framework where a text encoder (BERT-based) and a structure encoder collaborate: the structure encoder builds a coding tree of the label hierarchy via structural entropy minimization and produces an information-rich positive view $h_T$ that is fused with the text view $h_D$ through a contrastive objective. A formal information lossless learning principle is proven, showing that the mutual information preserved by HILL upper-bounds that of augmentation-based methods. Empirically, HILL achieves state-of-the-art results on three HTC datasets (WOS, RCV1-v2, NYTimes), with notable improvements over baselines and efficient training due to a compact structure-encoder design. The work provides a principled, scalable path to incorporating label-structure into representation learning for HTC, with practical implications for hierarchical NLP tasks.
Abstract
Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information. In this paper, we tend to investigate the feasibility of a contrastive learning scheme in which the semantic and syntactic information inherent in the input sample is adequately reserved in the contrastive samples and fused during the learning process. Specifically, we propose an information lossless contrastive learning strategy for HTC, namely \textbf{H}ierarchy-aware \textbf{I}nformation \textbf{L}ossless contrastive \textbf{L}earning (HILL), which consists of a text encoder representing the input document, and a structure encoder directly generating the positive sample. The structure encoder takes the document embedding as input, extracts the essential syntactic information inherent in the label hierarchy with the principle of structural entropy minimization, and injects the syntactic information into the text representation via hierarchical representation learning. Experiments on three common datasets are conducted to verify the superiority of HILL.
