Table of Contents
Fetching ...

Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance

Shikshya Shiwakoti, Samuel Goldsmith, Ujjwal Pandit

TL;DR

This work tackles scalable research discovery by proposing an Attention-Based Subgraph Retriever that uses graph attention to prune large knowledge graphs into a focused subgraph, which a large language model then reasons over. The approach builds on GraphRAG-style ideas by embedding textual content from MAG into a homogeneous graph and applying GAT-based pruning with a seed-node retrieval strategy, followed by LLM reranking. Experiments show the method underperforms traditional IR baselines on a 1k test set but demonstrates the feasibility of obtaining reasoning from the retrieved subgraph through LLM integration. The study highlights practical challenges and outlines clear directions for extending to heterogeneous graphs, larger MAG subsets, and dynamic graph scenarios with improved robustness.

Abstract

In today's information-driven world, access to scientific publications has become increasingly easy. At the same time, filtering through the massive volume of available research has become more challenging than ever. Graph Neural Networks (GNNs) and graph attention mechanisms have shown strong effectiveness in searching large-scale information databases, particularly when combined with modern large language models. In this paper, we propose an Attention-Based Subgraph Retriever, a GNN-as-retriever model that applies attention-based pruning to extract a refined subgraph, which is then passed to a large language model for advanced knowledge reasoning.

Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance

TL;DR

This work tackles scalable research discovery by proposing an Attention-Based Subgraph Retriever that uses graph attention to prune large knowledge graphs into a focused subgraph, which a large language model then reasons over. The approach builds on GraphRAG-style ideas by embedding textual content from MAG into a homogeneous graph and applying GAT-based pruning with a seed-node retrieval strategy, followed by LLM reranking. Experiments show the method underperforms traditional IR baselines on a 1k test set but demonstrates the feasibility of obtaining reasoning from the retrieved subgraph through LLM integration. The study highlights practical challenges and outlines clear directions for extending to heterogeneous graphs, larger MAG subsets, and dynamic graph scenarios with improved robustness.

Abstract

In today's information-driven world, access to scientific publications has become increasingly easy. At the same time, filtering through the massive volume of available research has become more challenging than ever. Graph Neural Networks (GNNs) and graph attention mechanisms have shown strong effectiveness in searching large-scale information databases, particularly when combined with modern large language models. In this paper, we propose an Attention-Based Subgraph Retriever, a GNN-as-retriever model that applies attention-based pruning to extract a refined subgraph, which is then passed to a large language model for advanced knowledge reasoning.

Paper Structure

This paper contains 21 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The framework of GRIL
  • Figure 2: Outline of the SAGPool attention pruning mechanism.
  • Figure 3: Algorithm of Attention-based Graph Retriever. It takes in the graph, query, seed, and layer count (number of hops) as input. It starts by initializing the frontier $P_1$ to the neighboring nodes of the seed. Then it computes attention scores by convolving over the subgraph in relation to the query. The subgraph then gets pruned where attention scores are below the desired hyperparameter sigma via SAGPool. This repeats the layer count amount of times, resulting in a subgraph with a max radius of L.