Table of Contents
Fetching ...

AH-UGC: Adaptive and Heterogeneous-Universal Graph Coarsening

Mohit Kataria, Shreyash Bhilwade, Sandeep Kumar, Jayadeva

TL;DR

This work addresses the lack of adaptive and heterogeneous graph coarsening by proposing AH-UGC, a unified framework that fuses Locality-Sensitive Hashing (LSH) with Consistent Hashing (CH) to produce multiple coarsened graphs from a single projection. It introduces a type-isolated coarsening strategy for heterogeneous graphs and an augmented node representation that blends features and topology, enabling robust multi-resolution reductions with $ ilde{A} = \mathcal{C}^\top A \mathcal{C}$. The approach is model-agnostic, streaming-friendly, and scalable, validated on 23 real-world datasets showing favorable runtime, spectral fidelity (HE, RcE, REE), and downstream node-classification accuracy across homogeneous, heterophilic, and heterogeneous graphs. Overall, AH-UGC offers a practical, scalable solution for adaptive graph coarsening that preserves semantic integrity in complex graphs, unlocking efficient learning and inference at multiple resolutions.

Abstract

$\textbf{Graph Coarsening (GC)}$ is a prominent graph reduction technique that compresses large graphs to enable efficient learning and inference. However, existing GC methods generate only one coarsened graph per run and must recompute from scratch for each new coarsening ratio, resulting in unnecessary overhead. Moreover, most prior approaches are tailored to $\textit{homogeneous}$ graphs and fail to accommodate the semantic constraints of $\textit{heterogeneous}$ graphs, which comprise multiple node and edge types. To overcome these limitations, we introduce a novel framework that combines Locality Sensitive Hashing (LSH) with Consistent Hashing to enable $\textit{adaptive graph coarsening}$. Leveraging hashing techniques, our method is inherently fast and scalable. For heterogeneous graphs, we propose a $\textit{type isolated coarsening}$ strategy that ensures semantic consistency by restricting merges to nodes of the same type. Our approach is the first unified framework to support both adaptive and heterogeneous coarsening. Extensive evaluations on 23 real-world datasets including homophilic, heterophilic, homogeneous, and heterogeneous graphs demonstrate that our method achieves superior scalability while preserving the structural and semantic integrity of the original graph.

AH-UGC: Adaptive and Heterogeneous-Universal Graph Coarsening

TL;DR

This work addresses the lack of adaptive and heterogeneous graph coarsening by proposing AH-UGC, a unified framework that fuses Locality-Sensitive Hashing (LSH) with Consistent Hashing (CH) to produce multiple coarsened graphs from a single projection. It introduces a type-isolated coarsening strategy for heterogeneous graphs and an augmented node representation that blends features and topology, enabling robust multi-resolution reductions with . The approach is model-agnostic, streaming-friendly, and scalable, validated on 23 real-world datasets showing favorable runtime, spectral fidelity (HE, RcE, REE), and downstream node-classification accuracy across homogeneous, heterophilic, and heterogeneous graphs. Overall, AH-UGC offers a practical, scalable solution for adaptive graph coarsening that preserves semantic integrity in complex graphs, unlocking efficient learning and inference at multiple resolutions.

Abstract

is a prominent graph reduction technique that compresses large graphs to enable efficient learning and inference. However, existing GC methods generate only one coarsened graph per run and must recompute from scratch for each new coarsening ratio, resulting in unnecessary overhead. Moreover, most prior approaches are tailored to graphs and fail to accommodate the semantic constraints of graphs, which comprise multiple node and edge types. To overcome these limitations, we introduce a novel framework that combines Locality Sensitive Hashing (LSH) with Consistent Hashing to enable . Leveraging hashing techniques, our method is inherently fast and scalable. For heterogeneous graphs, we propose a strategy that ensures semantic consistency by restricting merges to nodes of the same type. Our approach is the first unified framework to support both adaptive and heterogeneous coarsening. Extensive evaluations on 23 real-world datasets including homophilic, heterophilic, homogeneous, and heterogeneous graphs demonstrate that our method achieves superior scalability while preserving the structural and semantic integrity of the original graph.

Paper Structure

This paper contains 20 sections, 5 theorems, 27 equations, 7 figures, 10 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $x, y \in \mathbb{R}^d$, and let the projection function be defined as: $h(x) = \sum_{j=1}^{\ell} r_j^\top x, \quad r_j \sim \mathcal{N}(0, I_d) \text{ i.i.d.}$ Then the difference $h(x) - h(y) \sim \mathcal{N}(0, \ell \|x - y\|^2)$, and for any $\varepsilon > 0$:

Figures (7)

  • Figure 1: AH-UGC consists of three modules: (a) $\mathcal{M}_{\text{LSH}}$ constructs an augmented feature matrix by combining node features and structural context using a heterophily-aware factor $\alpha$, enabling support for both homophilic and heterophilic graphs. Inspired by UGC kataria2024ugc, we use LSH projections to compute node hash indices via $\psi({h^{\mathcal{P}k}}^l_1)$ (see Section \ref{['sec:method']}); (b) $\mathcal{M}_{\text{CH}}$ applies consistent hashing to merge nodes clockwise based on a target coarsening ratio $r$, yielding the coarsening matrix $\mathcal{C}$; (c) the coarsened graph $\mathcal{G}_c$ is obtained via $A_c = \mathcal{C}^\top A \mathcal{C}$. The framework is inherently adaptive— i.e., once an intermediate coarsening is obtained, further reduction can be applied incrementally using $\mathcal{M}_{\text{CH}}$ and already calculated coarsening matrix $\mathcal{C}$, enabling efficient multi-resolution processing.
  • Figure 2: Comparison of capability support across existing GC methods.
  • Figure 3: Empirical proof that two feature vectors remain close in projection space.
  • Figure 4: Supernode impurity across AH-UGC (left), UGC (center) and VAN (right) on IMDB dataset. Different colors represent different node types(Movie, Director, Actor).
  • Figure 5: Node classification accuracy on the hDBLP dataset under decreasing coarsening ratios for three heteroGNN models: HeteroSGC (left), HeteroGCN (center), and HeteroGCN2 (right).
  • ...and 2 more figures

Theorems & Definitions (9)

  • Definition 2.1: Graph
  • Definition 2.2
  • Definition 2.3
  • Theorem 3.1
  • Lemma 1
  • Remark 1
  • Theorem 3.2: Explicit Load Balance via Random Rightward Merges
  • Theorem C.1: Explicit Load Balance via Random Rightward Merges
  • Theorem D.1: Projection Proximity for Similar Points