Table of Contents
Fetching ...

CHAT: Beyond Contrastive Graph Transformer for Link Prediction in Heterogeneous Networks

Shengming Zhang, Le Zhang, Jingbo Zhou, Hui Xiong

TL;DR

The Contrastive Heterogeneous grAph Transformer (CHAT) introduces a novel sampling-based graph transformer technique that selectively retains nodes of interest, thereby obviating the need for predefined meta-paths.

Abstract

Link prediction in heterogeneous networks is crucial for understanding the intricacies of network structures and forecasting their future developments. Traditional methodologies often face significant obstacles, including over-smoothing-wherein the excessive aggregation of node features leads to the loss of critical structural details-and a dependency on human-defined meta-paths, which necessitate extensive domain knowledge and can be inherently restrictive. These limitations hinder the effective prediction and analysis of complex heterogeneous networks. In response to these challenges, we propose the Contrastive Heterogeneous grAph Transformer (CHAT). CHAT introduces a novel sampling-based graph transformer technique that selectively retains nodes of interest, thereby obviating the need for predefined meta-paths. The method employs an innovative connection-aware transformer to encode node sequences and their interconnections with high fidelity, guided by a dual-faceted loss function specifically designed for heterogeneous network link prediction. Additionally, CHAT incorporates an ensemble link predictor that synthesizes multiple samplings to achieve enhanced prediction accuracy. We conducted comprehensive evaluations of CHAT using three distinct drug-target interaction (DTI) datasets. The empirical results underscore CHAT's superior performance, outperforming both general-task approaches and models specialized in DTI prediction. These findings substantiate the efficacy of CHAT in addressing the complex problem of link prediction in heterogeneous networks.

CHAT: Beyond Contrastive Graph Transformer for Link Prediction in Heterogeneous Networks

TL;DR

The Contrastive Heterogeneous grAph Transformer (CHAT) introduces a novel sampling-based graph transformer technique that selectively retains nodes of interest, thereby obviating the need for predefined meta-paths.

Abstract

Link prediction in heterogeneous networks is crucial for understanding the intricacies of network structures and forecasting their future developments. Traditional methodologies often face significant obstacles, including over-smoothing-wherein the excessive aggregation of node features leads to the loss of critical structural details-and a dependency on human-defined meta-paths, which necessitate extensive domain knowledge and can be inherently restrictive. These limitations hinder the effective prediction and analysis of complex heterogeneous networks. In response to these challenges, we propose the Contrastive Heterogeneous grAph Transformer (CHAT). CHAT introduces a novel sampling-based graph transformer technique that selectively retains nodes of interest, thereby obviating the need for predefined meta-paths. The method employs an innovative connection-aware transformer to encode node sequences and their interconnections with high fidelity, guided by a dual-faceted loss function specifically designed for heterogeneous network link prediction. Additionally, CHAT incorporates an ensemble link predictor that synthesizes multiple samplings to achieve enhanced prediction accuracy. We conducted comprehensive evaluations of CHAT using three distinct drug-target interaction (DTI) datasets. The empirical results underscore CHAT's superior performance, outperforming both general-task approaches and models specialized in DTI prediction. These findings substantiate the efficacy of CHAT in addressing the complex problem of link prediction in heterogeneous networks.
Paper Structure (25 sections, 1 theorem, 6 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 6 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

theorem 1

The Concentrated Graph Sampling is a generalized form of meta-path-based approaches.

Figures (5)

  • Figure 1: Comparison between GNN and transformer-based approaches in heterogeneous networks. GNN explores up to k-hop neighbors, where nodes beyond k-hop are unreachable (Red region), increasing k leads to over-smoothing issues. Transformer explores neighbors of long-distance, extending the reachable region (blue region) without over-smoothing.
  • Figure 2: Architecture of CHAT. Green node $(1)$ is a head node, blue nodes $(3,4,6)$ are tail nodes, gray $(2)$ and orange $(5)$ node are non-interest nodes. The graph sampling technique first samples subgraph sequences from the heterogeneous network (top left), and non-interest nodes are converted into connections, i.e. tuples of edge types (top center, concatenated colored blocks). Connection encodings of the same dimension as node features are adopted w.r.t. each connections (top right), generating feature matrix and combining with position encodings as input of transformer. The connection-aware transformer is supervised by two loss functions, i.e. the contrastive link prediction loss and the observation probability loss. An ensemble link predictor is proposed for link prediction based on multiple views of data samples.
  • Figure 3: Ablation studies on three datasets.
  • Figure 4: Relative importance of top-30 connections.
  • Figure 5: Sensitivity analysis of sample size.

Theorems & Definitions (1)

  • theorem 1