SuperRAG: Beyond RAG with Layout-Aware Graph Modeling
Jeff Yang, Duy-Khanh Vu, Minh-Tien Nguyen, Xuan-Quang Nguyen, Linh Nguyen, Hung Le
TL;DR
This work addresses the challenge of multimodal document understanding within Retrieval Augmented Generation by introducing Layout-Aware Graph Modeling (LAGM), a graph-based representation that preserves document layout and the relationships among text, tables, and diagrams. The SuperRAG framework combines LAGM with flexible retrieval strategies (LLM-driven graph traversal and heuristic TOC/table/diagram reasoning) and graph augmentation to enable accurate, scalable multimodal QA. Through DOCBENCH and SPIQA evaluations, SuperRAG demonstrates significant improvements over non-layout RAG and strong baselines, validating the value of layout-aware structure for IR and reasoning. The proposed system is practical for business use, offering a modular, demo-enabled pipeline with robust parsing, data modeling, IR, and prompt design that can be integrated into existing RAG workflows.
Abstract
This paper introduces layout-aware graph modeling for multimodal RAG. Different from traditional RAG methods that mostly deal with flat text chunks, the proposed method takes into account the relationship of multimodalities by using a graph structure. To do that, a graph modeling structure is defined based on document layout parsing. The structure of an input document is retained with the connection of text chunks, tables, and figures. This representation allows the method to handle complex questions that require information from multimodalities. To confirm the efficiency of the graph modeling, a flexible RAG pipeline is developed using robust components. Experimental results on four benchmark test sets confirm the contribution of the layout-aware modeling for performance improvement of the RAG pipeline.
