A Hierarchical Scale-free Graph Generator under Limited Resources
Xiaorui Qi, Yanlong Wen, Xiaojie Yuan
TL;DR
The paper tackles graph generation under limited or no access to training data by leveraging invariant scale-free properties. It introduces a two-stage hierarchical generator: first, anchor-based substructure formation to enforce scale-free structure, and second, a degree-mixing edge-generation process with two thresholds to control exotic structures, underpinned by theoretical guarantees. Across 12 datasets from three categories, the method generally achieves closer alignment to ground truth distributions (measured by MMD on degree, clustering, and orbit) than traditional baselines and several deep-generative models, though some highly clustered or lattice-like graphs remain challenging. The approach offers a practical non-learning-based path for generating diverse graphs under resource constraints, with potential impact on domains where data are restricted or costly to obtain.
Abstract
Graph generation is one of the most challenging tasks in recent years, and its core is to learn the ground truth distribution hiding in the training data. However, training data may not be available due to security concerns or unaffordable costs, which severely blows the learning models, especially the deep generative models. The dilemma leads us to rethink non-learned generation methods based on graph invariant features. Based on the observation of scale-free property, we propose a hierarchical scale-free graph generation algorithm. Specifically, we design a two-stage generation strategy. In the first stage, we sample multiple anchor nodes to further guide the formation of substructures, splitting the initial node set into multiple ones. Next, we progressively generate edges by sampling nodes through a degree mixing distribution, adjusting the tolerance towards exotic structures via two thresholds. We provide theoretical guarantees for hierarchical generation and verify the effectiveness of our method under 12 datasets of three categories. Experimental results show that our method fits the ground truth distribution better than various generation strategies and other distribution observations.
