Table of Contents
Fetching ...

Semi-supervised Node Importance Estimation with Informative Distribution Modeling for Uncertainty Regularization

Yankai Chen, Taotao Wang, Yixiang Fang, Yunyu Xiao

TL;DR

This work addresses the problem of estimating continuous node importance in heterogeneous graphs when ground-truth labels are partially available. It introduces EASING, a semi-supervised framework that explicitly models uncertainty via a Distribution-based Joint Estimator (DJE) to jointly predict node importance and its uncertainty, and to generate high-quality pseudo-labels for unlabeled nodes. The learning objective combines labeled and pseudo-labeled data under a heteroscedastic regression regime, enabling uncertainty-aware regularization and robust training. Empirical results on three real-world datasets show that EASING consistently outperforms strong baselines on both value estimation and ranking, with improved robustness to limited labeled data and strong compatibility with other graph models. The approach promises practical impact for applications requiring reliable node significance measures under scarce labeling and heterogeneous information.

Abstract

Node importance estimation, a classical problem in network analysis, underpins various web applications. Previous methods either exploit intrinsic topological characteristics, e.g., graph centrality, or leverage additional information, e.g., data heterogeneity, for node feature enhancement. However, these methods follow the supervised learning setting, overlooking the fact that ground-truth node-importance data are usually partially labeled in practice. In this work, we propose the first semi-supervised node importance estimation framework, i.e., EASING, to improve learning quality for unlabeled data in heterogeneous graphs. Different from previous approaches, EASING explicitly captures uncertainty to reflect the confidence of model predictions. To jointly estimate the importance values and uncertainties, EASING incorporates DJE, a deep encoder-decoder neural architecture. DJE introduces distribution modeling for graph nodes, where the distribution representations derive both importance and uncertainty estimates. Additionally, DJE facilitates effective pseudo-label generation for the unlabeled data to enrich the training samples. Based on labeled and pseudo-labeled data, EASING develops effective semi-supervised heteroscedastic learning with varying node uncertainty regularization. Extensive experiments on three real-world datasets highlight the superior performance of EASING compared to competing methods. Codes are available via https://github.com/yankai-chen/EASING.

Semi-supervised Node Importance Estimation with Informative Distribution Modeling for Uncertainty Regularization

TL;DR

This work addresses the problem of estimating continuous node importance in heterogeneous graphs when ground-truth labels are partially available. It introduces EASING, a semi-supervised framework that explicitly models uncertainty via a Distribution-based Joint Estimator (DJE) to jointly predict node importance and its uncertainty, and to generate high-quality pseudo-labels for unlabeled nodes. The learning objective combines labeled and pseudo-labeled data under a heteroscedastic regression regime, enabling uncertainty-aware regularization and robust training. Empirical results on three real-world datasets show that EASING consistently outperforms strong baselines on both value estimation and ranking, with improved robustness to limited labeled data and strong compatibility with other graph models. The approach promises practical impact for applications requiring reliable node significance measures under scarce labeling and heterogeneous information.

Abstract

Node importance estimation, a classical problem in network analysis, underpins various web applications. Previous methods either exploit intrinsic topological characteristics, e.g., graph centrality, or leverage additional information, e.g., data heterogeneity, for node feature enhancement. However, these methods follow the supervised learning setting, overlooking the fact that ground-truth node-importance data are usually partially labeled in practice. In this work, we propose the first semi-supervised node importance estimation framework, i.e., EASING, to improve learning quality for unlabeled data in heterogeneous graphs. Different from previous approaches, EASING explicitly captures uncertainty to reflect the confidence of model predictions. To jointly estimate the importance values and uncertainties, EASING incorporates DJE, a deep encoder-decoder neural architecture. DJE introduces distribution modeling for graph nodes, where the distribution representations derive both importance and uncertainty estimates. Additionally, DJE facilitates effective pseudo-label generation for the unlabeled data to enrich the training samples. Based on labeled and pseudo-labeled data, EASING develops effective semi-supervised heteroscedastic learning with varying node uncertainty regularization. Extensive experiments on three real-world datasets highlight the superior performance of EASING compared to competing methods. Codes are available via https://github.com/yankai-chen/EASING.

Paper Structure

This paper contains 36 sections, 2 theorems, 30 equations, 4 figures, 13 tables.

Key Result

Theorem 1

For $x'$$\in$$\mathcal{D}'$ and $i \in \{1,2\}$, let $s_{x'}$ be the "unknown ground-truth" label of $x'$ and we have:

Figures (4)

  • Figure 1: With existing partially labeled data, i.e., indicated by golden stars, EASING further constructs pseudo-labeled importance values (in black) and uncertainties (in gray) for semi-supervised learning.
  • Figure 2: Illustration of our EASING framework with (A) DJE structure and (B) uncertainty-regularized learning flow.
  • Figure 3: Uncertainties of labeled and pseudo-labeled data.
  • Figure 4: Model performance with different labeled data percentage.

Theorems & Definitions (2)

  • Theorem 1: Pseudo-label Quality
  • Theorem 1: Pseudo-label Quality