Table of Contents
Fetching ...

Crowdsourced Homophily Ties Based Graph Annotation Via Large Language Model

Yu Bu, Yulin Zhu, Kai Zhou

TL;DR

CSA-LLM addresses label scarcity in graph learning by fusing crowdsourced annotations with large language models and leveraging graph structure through homophily ties. It uses 1-hop and 2-hop subgraph contexts to generate robust prompts for LLM-based labeling and introduces a two-stage active-node filtering mechanism to prioritize informative nodes. Theoretical analysis shows homophily-dominant information is preserved in multi-hop neighborhoods under reasonable assumptions, and empirically CSA-LLM improves GNN performance on Cora and Citeseer. This approach reduces labeling costs while delivering higher-quality annotations that enhance downstream graph learning.

Abstract

Accurate graph annotation typically requires substantial labeled data, which is often challenging and resource-intensive to obtain. In this paper, we present Crowdsourced Homophily Ties Based Graph Annotation via Large Language Model (CSA-LLM), a novel approach that combines the strengths of crowdsourced annotations with the capabilities of large language models (LLMs) to enhance the graph annotation process. CSA-LLM harnesses the structural context of graph data by integrating information from 1-hop and 2-hop neighbors. By emphasizing homophily ties - key connections that signify similarity within the graph - CSA-LLM significantly improves the accuracy of annotations. Experimental results demonstrate that this method enhances the performance of Graph Neural Networks (GNNs) by delivering more precise and reliable annotations.

Crowdsourced Homophily Ties Based Graph Annotation Via Large Language Model

TL;DR

CSA-LLM addresses label scarcity in graph learning by fusing crowdsourced annotations with large language models and leveraging graph structure through homophily ties. It uses 1-hop and 2-hop subgraph contexts to generate robust prompts for LLM-based labeling and introduces a two-stage active-node filtering mechanism to prioritize informative nodes. Theoretical analysis shows homophily-dominant information is preserved in multi-hop neighborhoods under reasonable assumptions, and empirically CSA-LLM improves GNN performance on Cora and Citeseer. This approach reduces labeling costs while delivering higher-quality annotations that enhance downstream graph learning.

Abstract

Accurate graph annotation typically requires substantial labeled data, which is often challenging and resource-intensive to obtain. In this paper, we present Crowdsourced Homophily Ties Based Graph Annotation via Large Language Model (CSA-LLM), a novel approach that combines the strengths of crowdsourced annotations with the capabilities of large language models (LLMs) to enhance the graph annotation process. CSA-LLM harnesses the structural context of graph data by integrating information from 1-hop and 2-hop neighbors. By emphasizing homophily ties - key connections that signify similarity within the graph - CSA-LLM significantly improves the accuracy of annotations. Experimental results demonstrate that this method enhances the performance of Graph Neural Networks (GNNs) by delivering more precise and reliable annotations.

Paper Structure

This paper contains 8 sections, 1 theorem, 7 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Consider a graph $G$ that does not contain self-loops and has a set of labels $\mathcal{Y}$ (pseudo-labels generated by LLM). For each node $v$, assume that the class labels of its neighbors $\{y_u : u \in N(v)\}$ are conditionally independent given the label $y_v$ of the node itself. Furthermore, a

Figures (5)

  • Figure 1: Crowdsourced Annotation via LLM for GNN.
  • Figure 2: Test Accuracy for LLM as Annotator.
  • Figure 3: Nodes Clustering According to C-Density.
  • Figure 4: Test Accuracy According to Distance in C-Density and Confidence Score.
  • Figure 5: Training and Testing Accuracy for GCN through CSA-LLM.

Theorems & Definitions (2)

  • Theorem 1
  • proof