Table of Contents
Fetching ...

KaPQA: Knowledge-Augmented Product Question-Answering

Swetha Eppalapally, Daksh Dangi, Chaithra Bhat, Ankita Gupta, Ruiyi Zhang, Shubham Agarwal, Karishma Bagga, Seunghyun Yoon, Nedim Lipka, Ryan A. Rossi, Franck Dernoncourt

TL;DR

KaPQA addresses the lack of domain-specific QA benchmarks by introducing two Photoshop/Acrobat HelpX datasets and a knowledge-driven RAG-QA framework that uses knowledge-base triples to reformulate queries. The approach aims to improve retrieval and long-form answer generation by grounding queries in domain knowledge, with extensive experiments showing improvements over baselines but also highlighting the challenge of real-world product QA. Key findings include the importance of a high-precision triple retriever and the nuanced effects of language model choice (e.g., GPT-3.5 vs GPT-4o) on reformulation quality and retrieval. The work provides valuable benchmarks for enterprise QA and demonstrates how knowledge augmentation can bridge gaps between generic RAG-QA methods and industry-specific needs, though it also reveals room for improvement in robustness and evaluation of long-form outputs.

Abstract

Question-answering for domain-specific applications has recently attracted much interest due to the latest advancements in large language models (LLMs). However, accurately assessing the performance of these applications remains a challenge, mainly due to the lack of suitable benchmarks that effectively simulate real-world scenarios. To address this challenge, we introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products to help evaluate the performance of existing models on domain-specific product QA tasks. Additionally, we propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task. Our experiments demonstrated that inducing domain knowledge through query reformulation allowed for increased retrieval and generative performance when compared to standard RAG-QA methods. This improvement, however, is slight, and thus illustrates the challenge posed by the datasets introduced.

KaPQA: Knowledge-Augmented Product Question-Answering

TL;DR

KaPQA addresses the lack of domain-specific QA benchmarks by introducing two Photoshop/Acrobat HelpX datasets and a knowledge-driven RAG-QA framework that uses knowledge-base triples to reformulate queries. The approach aims to improve retrieval and long-form answer generation by grounding queries in domain knowledge, with extensive experiments showing improvements over baselines but also highlighting the challenge of real-world product QA. Key findings include the importance of a high-precision triple retriever and the nuanced effects of language model choice (e.g., GPT-3.5 vs GPT-4o) on reformulation quality and retrieval. The work provides valuable benchmarks for enterprise QA and demonstrates how knowledge augmentation can bridge gaps between generic RAG-QA methods and industry-specific needs, though it also reveals room for improvement in robustness and evaluation of long-form outputs.

Abstract

Question-answering for domain-specific applications has recently attracted much interest due to the latest advancements in large language models (LLMs). However, accurately assessing the performance of these applications remains a challenge, mainly due to the lack of suitable benchmarks that effectively simulate real-world scenarios. To address this challenge, we introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products to help evaluate the performance of existing models on domain-specific product QA tasks. Additionally, we propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task. Our experiments demonstrated that inducing domain knowledge through query reformulation allowed for increased retrieval and generative performance when compared to standard RAG-QA methods. This improvement, however, is slight, and thus illustrates the challenge posed by the datasets introduced.
Paper Structure (26 sections, 7 figures, 7 tables)

This paper contains 26 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The figure represents our proposed framework. 1a depicts the main RAG-QA pipeline consisting of a retriever and a generator, along with our proposed query reformulation sub-pipeline. 1b gives a detailed view of the various components in our sub-pipeline. The process starts with the generation of knowledge base triples using the Triples Generator. Next, all matching triples to the user query are retrieved using the Triple Retriever, classified based on their relevance to the original query using the Relevance Classifier and finally reformulated using the Query Enhancer.
  • Figure 2: GEval score relative to the position the gold document is passed in as context over Acrobat test set.
  • Figure 3: GEval score relative to the position the gold document is passed in as context over Photoshop test set.
  • Figure 4: NDCG scores for our proposed model, the DPR baseline, and query reformulation without triples using an LLM (noTrip) over the Acrobat test set.
  • Figure 5: GEval scores for our proposed model, the DPR baseline, and query reformulation without triples using an LLM (noTrip) over the Acrobat test set.
  • ...and 2 more figures