Struct-X: Enhancing Large Language Models Reasoning with Structured Data
Xiaoyu Tan, Haoyu Wang, Xihe Qiu, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi
TL;DR
Struct-X tackles the challenge of leveraging structured data for large language model reasoning by introducing a read-model-fill-reflect-reason workflow that topologically encodes knowledge graphs, fills gaps with retrieval, and distills tokens with Self-Reg before feeding a condensed topology-aware input to LLMs. A Graph Topology Encoder and a dedicated Knowledge Injection/Retrieval pipeline preserve both semantic and topological information, while an Auxiliary Module generates adaptive prompts to steer reasoning. Empirical results on knowledge-graph QA and long-document tasks show state-of-the-art performance across four benchmarks, with notable gains over embedding-based methods and open-source LLMs, including substantial improvements for Llama2 when augmented by Struct-X. The work demonstrates that dynamic knowledge injection, selective retrieval, and topology-aware representation can significantly boost complex, multi-hop reasoning in LLMs, offering practical gains for real-world structured data interpretation. Future work may focus on richer graph representations and more robust prompting strategies to further improve factuality and efficiency.
Abstract
Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-reflect-reason'' efficiently enabling LLMs to utilize structured data. It begins by encoding structured data into a topological space using graph embeddings, followed by filling in missing entity information with knowledge retrieval modules, and filtering out irrelevant tokens via a self-supervised module. The final phase involves constructing a topological network with selected tokens to further reduce the total token length for more effective LLM inference. Additionally, Struct-X includes an Auxiliary Module trained to generate prompts, aiding LLMs in analyzing structured data. Extensive experiments on benchmarks, including the knowledge graph question-answer task and the long document reading comprehension task, show that Struct-X notably improves LLM reasoning, demonstrating the effectiveness of structured data augmentation in improving LLM inference with complex input context.
