Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

Haohui Wang; Baoyu Jing; Kaize Ding; Yada Zhu; Wei Cheng; Si Zhang; Yonghui Fan; Liqing Zhang; Dawei Zhou

Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

Haohui Wang, Baoyu Jing, Kaize Ding, Yada Zhu, Wei Cheng, Si Zhang, Yonghui Fan, Liqing Zhang, Dawei Zhou

TL;DR

This work tackles long-tail node classification on graphs by casting it as multi-task learning across $T$ classes and deriving a loss-range based generalization bound that depends on the Gaussian complexity $G(\mathcal{H}(\mathbf{X}))$ and the loss range $\text{Range}(f_1, \ldots, f_T)$. Guided by the theory, the authors propose HierTail, a two-module framework with (i) hierarchical task grouping to control task-space complexity and enable cross-class knowledge sharing, and (ii) long-tail balanced contrastive learning that combines supervised contrastive loss $\mathcal{L}_{SCL}$ and balanced contrastive loss $\mathcal{L}_{BCL}$ to stabilize tail performance. The approach yields strong improvements across six real-world graphs, especially in harsh long-tail regimes, and ablations confirm the essential roles of both modules. The paper also provides thorough analyses of hyperparameters and scalability, demonstrating practical applicability and robustness. Overall, HierTail offers a principled, scalable solution for characterizing and generalizing long-tail classes on graphs.

Abstract

In the context of long-tail classification on graphs, the vast majority of existing work primarily revolves around the development of model debiasing strategies, intending to mitigate class imbalances and enhance the overall performance. Despite the notable success, there is very limited literature that provides a theoretical tool for characterizing the behaviors of long-tail classes in graphs and gaining insight into generalization performance in real-world scenarios. To bridge this gap, we propose a generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular class. Our theoretical results show that the generalization performance of long-tail classification is dominated by the overall loss range and the task complexity. Building upon the theoretical findings, we propose a novel generic framework HierTail for long-tail classification on graphs. In particular, we start with a hierarchical task grouping module that allows us to assign related tasks into hypertasks and thus control the complexity of the task space; then, we further design a balanced contrastive learning module to adaptively balance the gradients of both head and tail classes to control the loss range across all tasks in a unified fashion. Extensive experiments demonstrate the effectiveness of HierTail in characterizing long-tail classes on real graphs, which achieves up to 12.9% improvement over the leading baseline method in accuracy.

Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

TL;DR

This work tackles long-tail node classification on graphs by casting it as multi-task learning across

classes and deriving a loss-range based generalization bound that depends on the Gaussian complexity

and the loss range

. Guided by the theory, the authors propose HierTail, a two-module framework with (i) hierarchical task grouping to control task-space complexity and enable cross-class knowledge sharing, and (ii) long-tail balanced contrastive learning that combines supervised contrastive loss

and balanced contrastive loss

to stabilize tail performance. The approach yields strong improvements across six real-world graphs, especially in harsh long-tail regimes, and ablations confirm the essential roles of both modules. The paper also provides thorough analyses of hyperparameters and scalability, demonstrating practical applicability and robustness. Overall, HierTail offers a principled, scalable solution for characterizing and generalizing long-tail classes on graphs.

Abstract

Paper Structure (22 sections, 5 theorems, 16 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 5 theorems, 16 equations, 9 figures, 6 tables, 1 algorithm.

Introduction
Preliminary
Algorithm
Theoretical Analysis
HierTail Framework
Experiments
Experiment Setup
Performance Analysis
Ablation Study
Parameter and Complexity Analysis
Related Work
Conclusion
Symbols and notations
Details of $\texttt{Ratio}_{LT}(p)$
Details of Theoretical Analysis
...and 7 more sections

Key Result

Theorem 1

Given the node embedding extraction function $h\in \mathcal{H}$ and the task-specific classifier $f_1, \ldots, f_T \in \mathcal{F}$, with probability at least $1-\delta, \delta \in [0,1]$, we have where $\mathbf{X}$ is the node feature, $T$ is the number of tasks, $n_t$ is the number of nodes in task $t$, $R$ denotes the Lipschitz constant of fuctions in $\mathcal{F}$, loss function $l(\cdot, \cd

Figures (9)

Figure 1: An illustrative figure of long-tail distribution in the collaboration network (Cora-Full), where the green and red curves show balanced accuracy (bAcc) (%) of GCN and HierTail for node classification on each class. Blue and yellow bars represent the class frequency of unlabeled and labeled nodes.
Figure 2: Comparison between two long-tail distribution metrics on (a) the hard case of the original Cora-Full dataset and (b) the easy case of the down-sampled Cora-Full dataset. We observe that the class-imbalance ratio falls short in characterizing the task complexity of two datasets, while the long-tailedness ratio does.
Figure 3: The proposed HierTail framework with $L$ task-grouping layers.
Figure 4: An illustrative figure for M1 with two task-grouping layers. Step 1: nodes are first grouped into four tasks (each representing a class). Step 2: We learn the embeddings of the task prototypes. Finally, the node embeddings are updated by back-propagation.
Figure 5: Performance on groups of ten classes in Cora-Full dataset, where the yellow, red and green curves show bAcc (%) of GCN, HierTail and GraphSMOTE_T for node classification.
...and 4 more figures

Theorems & Definitions (7)

Definition 1: Long-Tailedness Ratio
Definition 2: Loss Range
Theorem 1: Generalization Error Bound
Lemma 1: Gaussian Complexity-Based Bound
Lemma 2: Property of Gaussian Complexity, Corollary 11 in Maurer16Benefit
Lemma 3: Chain Rule of Gaussian Complexity
Theorem 1: Generalization Error Bound

Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

TL;DR

Abstract

Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)