Table of Contents
Fetching ...

HetGCoT: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Academic Question Answering

Runsong Jia, Mengjia Wu, Ying Ding, Jie Lu, Yi Zhang

TL;DR

HetGCoT tackles academic QA over heterogeneous scholarly graphs by integrating HGNN-derived structural evidence with LLM-based reasoning. It introduces metapath naturalization, adaptive metapath selection via HGT and FastGTN, and a four-step chain-of-thought framework to produce interpretable answers. Across OpenAlex and DBLP, HetGCoT achieves state-of-the-art performance on journal recommendation and general academic QA tasks, while demonstrating strong adaptability across LLM architectures. The work highlights gains in interpretability and performance, with future work aiming to scale to more relation types and larger interdisciplinary datasets.

Abstract

Academic question answering (QA) in heterogeneous scholarly networks presents unique challenges requiring both structural understanding and interpretable reasoning. While graph neural networks (GNNs) capture structured graph information and large language models (LLMs) demonstrate strong capabilities in semantic comprehension, current approaches lack integration at the reasoning level. We propose HetGCoT, a framework enabling LLMs to effectively leverage and learn information from graphs to reason interpretable academic QA results. Our framework introduces three technical contributions: (1) a framework that transforms heterogeneous graph structural information into LLM-processable reasoning chains, (2) an adaptive metapath selection mechanism identifying relevant subgraphs for specific queries, and (3) a multi-step reasoning strategy systematically incorporating graph contexts into the reasoning process. Experiments on OpenAlex and DBLP datasets show our approach outperforms all sota baselines. The framework demonstrates adaptability across different LLM architectures and applicability to various scholarly question answering tasks.

HetGCoT: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Academic Question Answering

TL;DR

HetGCoT tackles academic QA over heterogeneous scholarly graphs by integrating HGNN-derived structural evidence with LLM-based reasoning. It introduces metapath naturalization, adaptive metapath selection via HGT and FastGTN, and a four-step chain-of-thought framework to produce interpretable answers. Across OpenAlex and DBLP, HetGCoT achieves state-of-the-art performance on journal recommendation and general academic QA tasks, while demonstrating strong adaptability across LLM architectures. The work highlights gains in interpretability and performance, with future work aiming to scale to more relation types and larger interdisciplinary datasets.

Abstract

Academic question answering (QA) in heterogeneous scholarly networks presents unique challenges requiring both structural understanding and interpretable reasoning. While graph neural networks (GNNs) capture structured graph information and large language models (LLMs) demonstrate strong capabilities in semantic comprehension, current approaches lack integration at the reasoning level. We propose HetGCoT, a framework enabling LLMs to effectively leverage and learn information from graphs to reason interpretable academic QA results. Our framework introduces three technical contributions: (1) a framework that transforms heterogeneous graph structural information into LLM-processable reasoning chains, (2) an adaptive metapath selection mechanism identifying relevant subgraphs for specific queries, and (3) a multi-step reasoning strategy systematically incorporating graph contexts into the reasoning process. Experiments on OpenAlex and DBLP datasets show our approach outperforms all sota baselines. The framework demonstrates adaptability across different LLM architectures and applicability to various scholarly question answering tasks.
Paper Structure (31 sections, 6 equations, 2 figures, 12 tables)