Domain-specific Question Answering with Hybrid Search
Dewang Sultania, Zhaoyu Lu, Twisha Naik, Franck Dernoncourt, David Seunghyun Yoon, Sanat Sharma, Trung Bui, Ashok Gupta, Tushar Vatsa, Suhas Suresha, Ishita Verma, Vibha Belavadi, Cheng Chen, Michael Friedrich
TL;DR
The paper addresses domain-specific question answering in enterprise contexts by introducing a production-ready Elasticsearch-based framework that blends dense and sparse retrieval signals with tunable boosts. It formulates a multi-signal scoring algorithm where the final document score combines semantic similarity, BM25-based relevance, and host-based authority, expressed as $score = \max(\text{matched chunks cosine score}) + \text{bm25\_boost} \times \text{BM25 score} + \text{host\_boost} \times \text{host\_score}$, and demonstrates how empirical tuning of the boosts and chunking strategy yields superior retrieval and answer quality. Through a rigorous evaluation on golden and negative datasets using nDCG, groundedness, and answer similarity, the hybrid dense-sparse approach with host boosting outperforms single-method baselines and shows robustness against jailbreak and irrelevant queries. The work provides practical insights for enterprise QA deployments, including scalability on Elasticsearch, tunable parameter regimes, and a comprehensive evaluation framework, with future plans for broader human evaluation, multilingual support, and multimodal capabilities.
Abstract
Domain specific question answering is an evolving field that requires specialized solutions to address unique challenges. In this paper, we show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance. Our system leverages a linear combination of relevance signals, including cosine similarity from dense retrieval, BM25 scores, and URL host matching, each with tunable boost parameters. Experimental results indicate that this hybrid method outperforms our single-retriever system, achieving improved accuracy while maintaining robust contextual grounding. These findings suggest that integrating multiple retrieval methodologies with weighted scoring effectively addresses the complexities of domain specific question answering in enterprise settings.
