Table of Contents
Fetching ...

Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot

Sachin Pathiyan Cherumanal, Lin Tian, Futoon M. Abushaqra, Angel Felipe Magnossao de Paula, Kaixin Ji, Danula Hettiachchi, Johanne R. Trippas, Halil Ali, Falk Scholer, Damiano Spina

TL;DR

This work addresses the challenge of building reliable, domain-specific conversational agents using LLMs by comparing two pragmatic approaches—Intent-Based (IB) and Retrieval-Augmented Generation (RAG)—against a manually curated FAQ-driven knowledge base for RMIT University CS programs. It introduces a testbed with 106 questions and 120 KB passages, and evaluates retrieval and generation quality using NDCG, BERTScore, and ROUGE-1, with significance tests. Findings show IB excels on Known and Out-of-KB queries, while RAG (especially with Dense Passage Retrieval) better handles Inferred questions with more context, albeit with hallucination risks when context is excessive. The study provides practical guidance for practitioners deploying LLM-based chatbots, highlights evaluation gaps, and shares open resources to bridge expert knowledge and industry needs.

Abstract

Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a customized LLM-based conversational agent able to answer frequently asked questions about computer science degrees and programs at RMIT University. Our demo aims to showcase how conversational information-seeking researchers can effectively communicate the benefits of using best practices to stakeholders interested in developing and deploying LLM-based chatbots. These practices are well-known in our community but often overlooked by practitioners who may not have access to this knowledge. The methodology and resources used in this demo serve as a bridge to facilitate knowledge transfer from experts, address industry professionals' practical needs, and foster a collaborative environment. The data and code of the demo are available at https://github.com/rmit-ir/walert.

Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot

TL;DR

This work addresses the challenge of building reliable, domain-specific conversational agents using LLMs by comparing two pragmatic approaches—Intent-Based (IB) and Retrieval-Augmented Generation (RAG)—against a manually curated FAQ-driven knowledge base for RMIT University CS programs. It introduces a testbed with 106 questions and 120 KB passages, and evaluates retrieval and generation quality using NDCG, BERTScore, and ROUGE-1, with significance tests. Findings show IB excels on Known and Out-of-KB queries, while RAG (especially with Dense Passage Retrieval) better handles Inferred questions with more context, albeit with hallucination risks when context is excessive. The study provides practical guidance for practitioners deploying LLM-based chatbots, highlights evaluation gaps, and shares open resources to bridge expert knowledge and industry needs.

Abstract

Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a customized LLM-based conversational agent able to answer frequently asked questions about computer science degrees and programs at RMIT University. Our demo aims to showcase how conversational information-seeking researchers can effectively communicate the benefits of using best practices to stakeholders interested in developing and deploying LLM-based chatbots. These practices are well-known in our community but often overlooked by practitioners who may not have access to this knowledge. The methodology and resources used in this demo serve as a bridge to facilitate knowledge transfer from experts, address industry professionals' practical needs, and foster a collaborative environment. The data and code of the demo are available at https://github.com/rmit-ir/walert.
Paper Structure (7 sections, 1 figure, 2 tables)

This paper contains 7 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Overall architecture of the two approaches implemented in Walert: IB and RAG.