Crowdsourced Homophily Ties Based Graph Annotation Via Large Language Model
Yu Bu, Yulin Zhu, Kai Zhou
TL;DR
CSA-LLM addresses label scarcity in graph learning by fusing crowdsourced annotations with large language models and leveraging graph structure through homophily ties. It uses 1-hop and 2-hop subgraph contexts to generate robust prompts for LLM-based labeling and introduces a two-stage active-node filtering mechanism to prioritize informative nodes. Theoretical analysis shows homophily-dominant information is preserved in multi-hop neighborhoods under reasonable assumptions, and empirically CSA-LLM improves GNN performance on Cora and Citeseer. This approach reduces labeling costs while delivering higher-quality annotations that enhance downstream graph learning.
Abstract
Accurate graph annotation typically requires substantial labeled data, which is often challenging and resource-intensive to obtain. In this paper, we present Crowdsourced Homophily Ties Based Graph Annotation via Large Language Model (CSA-LLM), a novel approach that combines the strengths of crowdsourced annotations with the capabilities of large language models (LLMs) to enhance the graph annotation process. CSA-LLM harnesses the structural context of graph data by integrating information from 1-hop and 2-hop neighbors. By emphasizing homophily ties - key connections that signify similarity within the graph - CSA-LLM significantly improves the accuracy of annotations. Experimental results demonstrate that this method enhances the performance of Graph Neural Networks (GNNs) by delivering more precise and reliable annotations.
