ReTAG: Retrieval-Enhanced, Topic-Augmented Graph-Based Global Sensemaking
Boyoung Kim, Dosung Lee, Sumin An, Jinseong Jeong, Paul Hongsuck Seo
TL;DR
ReTAG addresses global sensemaking by constructing a contextualized graph $G_c=(\mathcal{V},\mathcal{E})$ over a corpus and organizing it into hierarchical communities. It extends prior graph-based sensemaking with topic augmentation to build topic-specific subgraphs and retrieval augmentation to select the most relevant summaries during answer generation, all under a $W$-token context window across $L$ levels. The approach yields significant gains in comprehensiveness and diversity while substantially reducing inference time, demonstrated on Podcast and News Articles datasets, with improvements up to $90.3\%$ in speed and higher relevance when comparing topic-augmented and retrieval-augmented configurations. By integrating topic mining, keyword-expanded retrieval, and end-to-end prompting, ReTAG enables scalable, high-quality global sensemaking across large document collections.
Abstract
Recent advances in question answering have led to substantial progress in tasks such as multi-hop reasoning. However, global sensemaking-answering questions by synthesizing information from an entire corpus remains a significant challenge. A prior graph-based approach to global sensemaking lacks retrieval mechanisms, topic specificity, and incurs high inference costs. To address these limitations, we propose ReTAG, a Retrieval-Enhanced, Topic-Augmented Graph framework that constructs topic-specific subgraphs and retrieves the relevant summaries for response generation. Experiments show that ReTAG improves response quality while significantly reducing inference time compared to the baseline. Our code is available at https://github.com/bykimby/retag.
