Table of Contents
Fetching ...

Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval

Yu Xia, Junda Wu, Sungchul Kim, Tong Yu, Ryan A. Rossi, Haoliang Wang, Julian McAuley

TL;DR

The paper addresses semi-structured retrieval where user queries require both textual attributes and inter-document relations. It introduces Knowledge-Aware Retrieval (KAR), a framework that augments LLM-based query expansions with structured KG relations while using document texts as rich KG node representations and applying document-based relation filtering to ground expansions. Document triples are constructed from filtered neighbors and used to prompt LLMs to generate expansion candidates, producing q' for final embedding-based retrieval. Experiments on STaRK datasets (AMAZON, MAG, PRIME) show that KAR consistently outperforms state-of-the-art text-only and KG-augmented baselines, demonstrating strong performance in textual and relational retrieval with a scalable, zero-shot approach.

Abstract

Large language models (LLMs) have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking document relations. For queries like "Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses", existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). To further address the limitation of entity-based scoring in existing KG-based methods, we leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR). Extensive experiments on three datasets of diverse domains show the advantages of our method compared against state-of-the-art baselines on textual and relational semi-structured retrieval.

Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval

TL;DR

The paper addresses semi-structured retrieval where user queries require both textual attributes and inter-document relations. It introduces Knowledge-Aware Retrieval (KAR), a framework that augments LLM-based query expansions with structured KG relations while using document texts as rich KG node representations and applying document-based relation filtering to ground expansions. Document triples are constructed from filtered neighbors and used to prompt LLMs to generate expansion candidates, producing q' for final embedding-based retrieval. Experiments on STaRK datasets (AMAZON, MAG, PRIME) show that KAR consistently outperforms state-of-the-art text-only and KG-augmented baselines, demonstrating strong performance in textual and relational retrieval with a scalable, zero-shot approach.

Abstract

Large language models (LLMs) have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking document relations. For queries like "Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses", existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). To further address the limitation of entity-based scoring in existing KG-based methods, we leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR). Extensive experiments on three datasets of diverse domains show the advantages of our method compared against state-of-the-art baselines on textual and relational semi-structured retrieval.

Paper Structure

This paper contains 25 sections, 7 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Example query expansions generated by HyDE gao-etal-2023-precise, RAR shen-etal-2024-retrieval, and KAR (Ours) given a semi-structured product search query with both textual and relational requirements wu2024stark. While HyDE and RAR enrich the textual information, e.g., "wildlife" and "highly rated", they make up incorrect document relations, e.g., compatibility of "Nikon Coolpix P1000" with "F-Mount lenses". In contrast, our KAR utilizes document relations from knowledge graph, e.g., customers bought "Nikon Z7 II" and "F-Mount lenses" together, to generate semantically similarly and structurally related query expansions.
  • Figure 2: Overview of our knowledge-aware query expansion framework illustrated with an example academic paper search query with textual and relational requirements.
  • Figure 3: Influence of different values of $k$ for filtered top-$k$ neighbors in KAR.
  • Figure 4: Influence of sampled query expansions $n$.
  • Figure 5: Latency comparison of query expansions.