Table of Contents
Fetching ...

LH-Mix: Local Hierarchy Correlation Guided Mixup over Hierarchical Prompt Tuning

Fanshuang Kong, Richong Zhang, Ziqiao Wang

TL;DR

This work tackles hierarchical text classification (HTC) by leveraging a text-specific local hierarchy and introducing LH-Mix, which fuses depth-level hierarchical prompts with Mixup to model latent correlations among sibling labels. Local hierarchies are represented as sequences; a similarity measure based on the local-hierarchy CLS representation guides an adaptive Mixup ratio $oldsymbol{\lambda} = -(eta - 0.5) s^{\alpha} + \beta$, enabling more informative in-between samples. LH-Mix applies Mixup to both inputs and losses across each hierarchy depth, using a zero-bounded multi-label cross-entropy loss and loss mixing to optimize the model. Experiments on WOS, NYT, and RCV1-V2 demonstrate strong, consistent gains over state-of-the-art baselines, particularly in deeper and sparser hierarchies, underscoring the value of local-hierarchy-aware augmentation for HTC.

Abstract

Hierarchical text classification (HTC) aims to assign one or more labels in the hierarchy for each text. Many methods represent this structure as a global hierarchy, leading to redundant graph structures. To address this, incorporating a text-specific local hierarchy is essential. However, existing approaches often model this local hierarchy as a sequence, focusing on explicit parent-child relationships while ignoring implicit correlations among sibling/peer relationships. In this paper, we first integrate local hierarchies into a manual depth-level prompt to capture parent-child relationships. We then apply Mixup to this hierarchical prompt tuning scheme to improve the latent correlation within sibling/peer relationships. Notably, we propose a novel Mixup ratio guided by local hierarchy correlation to effectively capture intrinsic correlations. This Local Hierarchy Mixup (LH-Mix) model demonstrates remarkable performance across three widely-used datasets.

LH-Mix: Local Hierarchy Correlation Guided Mixup over Hierarchical Prompt Tuning

TL;DR

This work tackles hierarchical text classification (HTC) by leveraging a text-specific local hierarchy and introducing LH-Mix, which fuses depth-level hierarchical prompts with Mixup to model latent correlations among sibling labels. Local hierarchies are represented as sequences; a similarity measure based on the local-hierarchy CLS representation guides an adaptive Mixup ratio , enabling more informative in-between samples. LH-Mix applies Mixup to both inputs and losses across each hierarchy depth, using a zero-bounded multi-label cross-entropy loss and loss mixing to optimize the model. Experiments on WOS, NYT, and RCV1-V2 demonstrate strong, consistent gains over state-of-the-art baselines, particularly in deeper and sparser hierarchies, underscoring the value of local-hierarchy-aware augmentation for HTC.

Abstract

Hierarchical text classification (HTC) aims to assign one or more labels in the hierarchy for each text. Many methods represent this structure as a global hierarchy, leading to redundant graph structures. To address this, incorporating a text-specific local hierarchy is essential. However, existing approaches often model this local hierarchy as a sequence, focusing on explicit parent-child relationships while ignoring implicit correlations among sibling/peer relationships. In this paper, we first integrate local hierarchies into a manual depth-level prompt to capture parent-child relationships. We then apply Mixup to this hierarchical prompt tuning scheme to improve the latent correlation within sibling/peer relationships. Notably, we propose a novel Mixup ratio guided by local hierarchy correlation to effectively capture intrinsic correlations. This Local Hierarchy Mixup (LH-Mix) model demonstrates remarkable performance across three widely-used datasets.

Paper Structure

This paper contains 31 sections, 8 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: (a) A toy example of global hierarchy in HTC. (b) The local hierarchy of "CS/Machine Learning" and "Math/Statistics", which are extracted from (a). (c) Transformation from explicit parent-child relations (a) to spatial inclusion relations in latent space. Mixup enables the capture of varying degrees of implicit sibling/peer label correlation through different Mixup ratios $\lambda$.
  • Figure 2: Illustration of LH-Mix. The light orange color scheme represents elements of $X_i$, and the light green represents $X_j$. The mixture of orange and green represents the elements related to the Mixup operation.
  • Figure 3: Curves corresponding to Eq. \ref{['eq:s_lambda']}. We separately plot the effects of different $\beta$ and $\alpha$ on the function when $\alpha=1$ and $\beta=1$.
  • Figure 4: Performance on different downsample ratios to sparse data.
  • Figure 5: Performance on different $\beta$ and $\alpha$, when fixing $\alpha=1$ and $\beta=1$ respectively.
  • ...and 2 more figures