Table of Contents
Fetching ...

GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation

Tao Feng, Yihang Sun, Jiaxuan You

TL;DR

GraphEval presents a novel approach to idea evaluation by transforming complex research ideas into fine-grained viewpoint-graphs. It decomposes ideas into evaluable viewpoints using prompted LLMs, then builds intra- and inter-idea connections via LLM-based relation extraction and embedding similarity. Evaluation is performed with two lightweight methods: GraphEval-LP, a training-free label propagation on the viewpoint-graph, and GraphEval-GNN, a small-parameter GNN with novelty-detection capabilities for assessing originality. Experiments on two academic datasets show that GraphEval-GNN achieves substantial accuracy gains with modest resource costs and can effectively detect plagiarized ideas, offering a scalable, bias-mitigated alternative to prompt-based evaluations.

Abstract

The powerful capabilities of Large Language Models (LLMs) have led to their growing use in evaluating human-generated content, particularly in evaluating research ideas within academic settings. Existing solutions primarily rely on prompt-based LLM methods or fine-tuned lightweight language models for idea evaluation. However, these methods are often unstable and struggle to comprehend the complex semantic information embedded in the ideas, impeding their ability to perform high-quality evaluations. To address the above challenges, we propose GraphEval, a lightweight graph-based LLM framework for idea evaluation. Our insight is that a complex idea can be broken down into comprehensible viewpoint nodes using prompts from small LLMs. These viewpoint nodes can then be linked together through edges created from LLM-based relation extraction and/or BERT similarity scores. The created viewpoint-graph can be used to conveniently propagate scores across view-nodes to improve the robustness of the idea evaluations. In particular, we propose two lightweight graph-based methods for idea evaluation: (1) GraphEval-LP: a training-free label propagation algorithm that propagates evaluation scores from known view-nodes to unknown nodes; (2) GraphEval-GNN: a Graph Neural Networks (GNN) that is trained to predict the evaluation scores given the observed graph with minimal computation resources. Moreover, to overcome LLM's limitation in objectively assessing the novelty of ideas, we further propose a novelty detection model to GraphEval-GNN to enhance its capability in judging idea novelty. Experiments on two datasets show GraphEval improves F1 scores by at least 14% with low computation and API costs. Additionally, GraphEval can effectively detect plagiarized ideas.

GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation

TL;DR

GraphEval presents a novel approach to idea evaluation by transforming complex research ideas into fine-grained viewpoint-graphs. It decomposes ideas into evaluable viewpoints using prompted LLMs, then builds intra- and inter-idea connections via LLM-based relation extraction and embedding similarity. Evaluation is performed with two lightweight methods: GraphEval-LP, a training-free label propagation on the viewpoint-graph, and GraphEval-GNN, a small-parameter GNN with novelty-detection capabilities for assessing originality. Experiments on two academic datasets show that GraphEval-GNN achieves substantial accuracy gains with modest resource costs and can effectively detect plagiarized ideas, offering a scalable, bias-mitigated alternative to prompt-based evaluations.

Abstract

The powerful capabilities of Large Language Models (LLMs) have led to their growing use in evaluating human-generated content, particularly in evaluating research ideas within academic settings. Existing solutions primarily rely on prompt-based LLM methods or fine-tuned lightweight language models for idea evaluation. However, these methods are often unstable and struggle to comprehend the complex semantic information embedded in the ideas, impeding their ability to perform high-quality evaluations. To address the above challenges, we propose GraphEval, a lightweight graph-based LLM framework for idea evaluation. Our insight is that a complex idea can be broken down into comprehensible viewpoint nodes using prompts from small LLMs. These viewpoint nodes can then be linked together through edges created from LLM-based relation extraction and/or BERT similarity scores. The created viewpoint-graph can be used to conveniently propagate scores across view-nodes to improve the robustness of the idea evaluations. In particular, we propose two lightweight graph-based methods for idea evaluation: (1) GraphEval-LP: a training-free label propagation algorithm that propagates evaluation scores from known view-nodes to unknown nodes; (2) GraphEval-GNN: a Graph Neural Networks (GNN) that is trained to predict the evaluation scores given the observed graph with minimal computation resources. Moreover, to overcome LLM's limitation in objectively assessing the novelty of ideas, we further propose a novelty detection model to GraphEval-GNN to enhance its capability in judging idea novelty. Experiments on two datasets show GraphEval improves F1 scores by at least 14% with low computation and API costs. Additionally, GraphEval can effectively detect plagiarized ideas.

Paper Structure

This paper contains 24 sections, 4 equations, 5 figures, 26 tables, 1 algorithm.

Figures (5)

  • Figure 1: Current LLMs are highly sensitive to prompts and show biases in evaluations. This figure illustrates that even minor variations in the LLM's prompts (Original Prompt, Positive Prompt, Negative Prompt) for the same idea can lead to drastic changes in the final LLM evaluation results. Moreover, the LLM tends to always give friendly evaluations like 'Accept' and rarely gives negative evaluations such as 'Reject'. This observation demonstrates that the LLM evaluation is biased.
  • Figure 2: GraphEval performs a better idea evaluation than the existing LLM-based method by focusing on both the global and local information of the idea. In this figure, the part highlighted in red in the idea contain factual errors. The existing LLM-based method shown on the far left focuses solely on the global information of the idea, which often leads to overlooking factual errors interspersed within the idea. In contrast, GraphEval decomposes the idea into viewpoints to obtain scores for each viewpoint, then employs Mean Pooling and Min Pooling to extract global and local information of the idea, respectively. Finally, GraphEval derives a fair and unbiased evaluation based on these two aspects of information.
  • Figure 3: Overview of GraphEval methodology.GraphEval first transforms the ideas into a viewpoint-graph via Viewpoint-Graph Extraction, which contains multiple viewpoint-subgraphs, viewpoint-nodes, and edges between viewpoint-nodes. Then two lightweight GraphEval implementations named GraphEval-LP and GraphEval-GNN are employed to evaluate the ideas. Note that AGG denotes the acronym for aggregation function.
  • Figure 4: Novelty assessment can significantly improve the performance of GraphEval when detecting plagiarized or derivative ideas. We compare two variants of GraphEval on the ICLR Papers dataset and evaluate their performance on four metrics.
  • Figure 5: Example of viewpoint extraction from a research idea. This figure illustrates how a prompted LLM extracts fine-grained viewpoints from a research idea. Each viewpoint represents an independent, evaluable unit such as an idea, argument, or fact. The viewpoints capture distinct components of the research idea that contribute to its overall understanding.