Table of Contents
Fetching ...

HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

Zhijian Chen, Zhonghua Li, Jianxin Yang, Ye Qi

TL;DR

HiLight addresses the scalability issues of hierarchy-aware HTC by eliminating the structure encoder and instead using Hierarchical Local Contrastive Learning (HiLCL) within a lightweight global model composed of a text encoder and a multi-label head. HiLCL combines Local Contrastive Learning with a Hierarchical Learning schedule (HiLearn) to enforce discriminative, path-consistent behavior among labels, especially at finer granularity. Experiments on WOS and RCV1-v2 show competitive Micro-F1 and Macro-F1 scores while achieving superior parameter efficiency and robustness to collapse, outperforming several structure-encoder baselines on key metrics. The work demonstrates that hierarchical information can be effectively injected through task-driven contrastive learning without increasing model size, offering a scalable alternative for HTC in large taxonomies.

Abstract

Hierarchical text classification (HTC) is a special sub-task of multi-label classification (MLC) whose taxonomy is constructed as a tree and each sample is assigned with at least one path in the tree. Latest HTC models contain three modules: a text encoder, a structure encoder and a multi-label classification head. Specially, the structure encoder is designed to encode the hierarchy of taxonomy. However, the structure encoder has scale problem. As the taxonomy size increases, the learnable parameters of recent HTC works grow rapidly. Recursive regularization is another widely-used method to introduce hierarchical information but it has collapse problem and generally relaxed by assigning with a small weight (ie. 1e-6). In this paper, we propose a Hierarchy-aware Light Global model with Hierarchical local conTrastive learning (HiLight), a lightweight and efficient global model only consisting of a text encoder and a multi-label classification head. We propose a new learning task to introduce the hierarchical information, called Hierarchical Local Contrastive Learning (HiLCL). Extensive experiments are conducted on two benchmark datasets to demonstrate the effectiveness of our model.

HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

TL;DR

HiLight addresses the scalability issues of hierarchy-aware HTC by eliminating the structure encoder and instead using Hierarchical Local Contrastive Learning (HiLCL) within a lightweight global model composed of a text encoder and a multi-label head. HiLCL combines Local Contrastive Learning with a Hierarchical Learning schedule (HiLearn) to enforce discriminative, path-consistent behavior among labels, especially at finer granularity. Experiments on WOS and RCV1-v2 show competitive Micro-F1 and Macro-F1 scores while achieving superior parameter efficiency and robustness to collapse, outperforming several structure-encoder baselines on key metrics. The work demonstrates that hierarchical information can be effectively injected through task-driven contrastive learning without increasing model size, offering a scalable alternative for HTC in large taxonomies.

Abstract

Hierarchical text classification (HTC) is a special sub-task of multi-label classification (MLC) whose taxonomy is constructed as a tree and each sample is assigned with at least one path in the tree. Latest HTC models contain three modules: a text encoder, a structure encoder and a multi-label classification head. Specially, the structure encoder is designed to encode the hierarchy of taxonomy. However, the structure encoder has scale problem. As the taxonomy size increases, the learnable parameters of recent HTC works grow rapidly. Recursive regularization is another widely-used method to introduce hierarchical information but it has collapse problem and generally relaxed by assigning with a small weight (ie. 1e-6). In this paper, we propose a Hierarchy-aware Light Global model with Hierarchical local conTrastive learning (HiLight), a lightweight and efficient global model only consisting of a text encoder and a multi-label classification head. We propose a new learning task to introduce the hierarchical information, called Hierarchical Local Contrastive Learning (HiLCL). Extensive experiments are conducted on two benchmark datasets to demonstrate the effectiveness of our model.
Paper Structure (22 sections, 13 equations, 6 figures, 4 tables)

This paper contains 22 sections, 13 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Learnable parameter size (in MB) of recent HTC works at taxonomy sizes. All models adopt BERT as text encoder.
  • Figure 2: Illustration of HiLight. Given an input text, HiLight infers label probabilities by a text encoder and a multi-label classification head. With the inferred label probabilities and positive labels, HiLight conducts label space learning with MLC and HiLCL task. HiLCL is our proposed method and it divides the multi-label classification learning into multiple single-label classification learning. Then, HiLCL improves contrastive learning on each single-label classification learning with Local Hard Negative Sampling, which introduces negative labels from sibling and descendant label set of the positive label. Those negative labels outside the sibling and descendant label set are masked out during learning. HiLCL schedules learning with Hierarchical Learning strategy, which adopts a fine-to-coarse learning strategy to improve the discrimination of finest-grained labels.
  • Figure 3: An Example of HiLCL with target size of 3. Firstly, HiLCL divides the multi-label classification learning into 3 single-label classification learning. For each positive label, HiLCL conducts LCL task, which masks out outputs of other positive labels as well as easy negative labels and then contrasts the output of current positive label with outputs of hard negative labels. Meanwhile, HiLCL schedules the LCL learning with HiLearn, which learns finest-grained positive labels at early epochs and adds coarse-grained positive labels gradually.
  • Figure 4: Learnable parameter size of recent HTC models at different taxonomy sizes. All models adopt BERT as text encoder.
  • Figure 5: T-SNE visualization of label space mapping on RCV1-v2. Green dots indicate the input text. Blue dots indicate positive labels. Yellow dots indicate negative sibling and descendant labels. Yellow triangles indicate negative descendant labels. Yellow squares indicate negative sibling labels. Grey dots indicate easy negative labels.
  • ...and 1 more figures