Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval
Yu Xia, Junda Wu, Sungchul Kim, Tong Yu, Ryan A. Rossi, Haoliang Wang, Julian McAuley
TL;DR
The paper addresses semi-structured retrieval where user queries require both textual attributes and inter-document relations. It introduces Knowledge-Aware Retrieval (KAR), a framework that augments LLM-based query expansions with structured KG relations while using document texts as rich KG node representations and applying document-based relation filtering to ground expansions. Document triples are constructed from filtered neighbors and used to prompt LLMs to generate expansion candidates, producing q' for final embedding-based retrieval. Experiments on STaRK datasets (AMAZON, MAG, PRIME) show that KAR consistently outperforms state-of-the-art text-only and KG-augmented baselines, demonstrating strong performance in textual and relational retrieval with a scalable, zero-shot approach.
Abstract
Large language models (LLMs) have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking document relations. For queries like "Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses", existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). To further address the limitation of entity-based scoring in existing KG-based methods, we leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR). Extensive experiments on three datasets of diverse domains show the advantages of our method compared against state-of-the-art baselines on textual and relational semi-structured retrieval.
