Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance
Shikshya Shiwakoti, Samuel Goldsmith, Ujjwal Pandit
TL;DR
This work tackles scalable research discovery by proposing an Attention-Based Subgraph Retriever that uses graph attention to prune large knowledge graphs into a focused subgraph, which a large language model then reasons over. The approach builds on GraphRAG-style ideas by embedding textual content from MAG into a homogeneous graph and applying GAT-based pruning with a seed-node retrieval strategy, followed by LLM reranking. Experiments show the method underperforms traditional IR baselines on a 1k test set but demonstrates the feasibility of obtaining reasoning from the retrieved subgraph through LLM integration. The study highlights practical challenges and outlines clear directions for extending to heterogeneous graphs, larger MAG subsets, and dynamic graph scenarios with improved robustness.
Abstract
In today's information-driven world, access to scientific publications has become increasingly easy. At the same time, filtering through the massive volume of available research has become more challenging than ever. Graph Neural Networks (GNNs) and graph attention mechanisms have shown strong effectiveness in searching large-scale information databases, particularly when combined with modern large language models. In this paper, we propose an Attention-Based Subgraph Retriever, a GNN-as-retriever model that applies attention-based pruning to extract a refined subgraph, which is then passed to a large language model for advanced knowledge reasoning.
