Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering
Santhosh Thottingal
TL;DR
This work tackles the problem of hallucinations and inefficiencies in open-domain KB QA by replacing answer generation with a question-to-question retrieval mechanism. It generates a comprehensive set of questions for each content unit (Wikipedia paragraphs and Wikidata triples) using an instruction-tuned LLM, embeds them in a dense vector store, and retrieves content by matching user queries to these generated questions via cosine similarity, thereby returning the original text or structured data. The key contributions include a scalable, SPARQL-free approach that achieves high similarity scores (consistently above $0.9$ for relevant pairs) and eliminates the need for on-the-fly generation, resulting in faster, more reliable responses with multimodal retrieval capabilities. The method demonstrates strong potential for scalable, low-latency QA over large knowledge bases, with practical considerations around vector-store size, update strategies, and language support, while acknowledging limitations in multi-hop reasoning and coverage of generated questions. Overall, the paper proposes a practical, hallucination-averse alternative to RAG pipelines that leverages question-based indexing to enable precise, fast access to factual content in Wikipedia and Wikidata.
Abstract
This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata by performing "question-to-question" matching and retrieval from a dense vector embedding store. Instead of embedding document content, we generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM. These questions are vector-embedded and stored, mapping to the corresponding content. Vector embedding of user queries are then matched against this question vector store. The highest similarity score leads to direct retrieval of the associated article content, eliminating the need for answer generation. Our method achieves high cosine similarity ( > 0.9 ) for relevant question pairs, enabling highly precise retrieval. This approach offers several advantages including computational efficiency, rapid response times, and increased scalability. We demonstrate its effectiveness on Wikipedia and Wikidata, including multimedia content through structured fact retrieval from Wikidata, opening up new pathways for multimodal question answering.
