Automated Loss function Search for Class-imbalanced Node Classification

Xinyu Guo; Kai Wu; Xiaoyu Zhang; Jing Liu

Automated Loss function Search for Class-imbalanced Node Classification

Xinyu Guo, Kai Wu, Xiaoyu Zhang, Jing Liu

TL;DR

AutoLINC introduces an automated loss-function search framework for class-imbalanced node classification, leveraging Monte Carlo Tree Search to explore a CFG-based loss expression space and a loss-check strategy to prune ineffective candidates. By encoding the loss as a parse tree with inputs $\hat{y}$, $y$, and class counts $N$, and evaluating candidates via a lightweight proxy task, AutoLINC discovers losses that outperform state-of-the-art hand-crafted losses across multiple GNN backbones and datasets. The approach demonstrates strong performance, transferability among homogeneous graph types, and notable speedups from its pruning strategies, while remaining adaptable to different graph domains and imbalance levels. The work highlights the importance of homophily for transferability and shows that combining loss-search with graph-aware baselines like GraphSHA yields scalable gains in imbalanced node classification.

Abstract

Class-imbalanced node classification tasks are prevalent in real-world scenarios. Due to the uneven distribution of nodes across different classes, learning high-quality node representations remains a challenging endeavor. The engineering of loss functions has shown promising potential in addressing this issue. It involves the meticulous design of loss functions, utilizing information about the quantities of nodes in different categories and the network's topology to learn unbiased node representations. However, the design of these loss functions heavily relies on human expert knowledge and exhibits limited adaptability to specific target tasks. In this paper, we introduce a high-performance, flexible, and generalizable automated loss function search framework to tackle this challenge. Across 15 combinations of graph neural networks and datasets, our framework achieves a significant improvement in performance compared to state-of-the-art methods. Additionally, we observe that homophily in graph-structured data significantly contributes to the transferability of the proposed framework.

Automated Loss function Search for Class-imbalanced Node Classification

TL;DR

, and class counts

, and evaluating candidates via a lightweight proxy task, AutoLINC discovers losses that outperform state-of-the-art hand-crafted losses across multiple GNN backbones and datasets. The approach demonstrates strong performance, transferability among homogeneous graph types, and notable speedups from its pruning strategies, while remaining adaptable to different graph domains and imbalance levels. The work highlights the importance of homophily for transferability and shows that combining loss-search with graph-aware baselines like GraphSHA yields scalable gains in imbalanced node classification.

Abstract

Paper Structure (19 sections, 2 equations, 5 figures, 13 tables, 1 algorithm)

This paper contains 19 sections, 2 equations, 5 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Preliminary
AutoLINC
Problem Definition
Search Space
Search Algorithm
Loss Function Check
Framework of AutoLINC
Proxy Task
Experiments
Experimental Setup
Comparison with SOTA Loss Functions
Transferability and Convergence
Comparison with SOTA Non-loss Function Engineering Methods
...and 4 more sections

Figures (5)

Figure 1: A schematic diagram of the AutoLINC framework with two main modules. The first module is the Monte Carlo tree search, which iteratively performs selection, expansion, simulation, and backpropagation steps to find the optimal loss function. The second module is the loss function check strategy. AutoLINC proactively filter out low-quality loss functions through the loss inspection strategy when evaluating the loss function.
Figure 2: This figure illustrates the average scores of the Top 10 loss metrics during the search process. Naive represents MCTS without the proposed strategies.
Figure 3: The runtime on GCN across PubMed datasets using 10 random seeds.
Figure 4: The convergence of loss function discovered in Table \ref{['tab:exp']}.
Figure 5: The runtime on GCN across three citation network datasets using 10 random seeds.

Theorems & Definitions (2)

Definition 3.1
Definition 4.1

Automated Loss function Search for Class-imbalanced Node Classification

TL;DR

Abstract

Automated Loss function Search for Class-imbalanced Node Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)