Table of Contents
Fetching ...

Alquist 5.0: Dialogue Trees Meet Generative Models. A Novel Approach for Enhancing SocialBot Conversations

Ondřej Kobza, Jan Čuhel, Tommaso Gargiani, David Herel, Petr Marek

TL;DR

Alquist 5.0 advances socialbot dialogue by integrating Barista, a BlenderBot 3–based Neural Response Generator, into a modular, topic-driven architecture that couples scripted dialogues with generative models. It introduces a suite of novel Barista components (fast classifiers, FiD-inspired knowledge extraction, and a refined query-generation pipeline) plus VicuChat for knowledge-aware responses, all deployed within a robust multimodal UI and APIHub knowledge sources. A combined safety framework fuses fastText classifiers with rule-based checks to improve safety in open-domain chats. The work demonstrates improved conversational quality, reduced repetition, and responsive knowledge access on multimodal devices, contributing practical techniques for safe, engaging SocialBot experiences. The approach offers scalable pathways for integrating LLMs into dialogue management through an LLM loop and hybrid dialogue strategies, with tangible benefits for user experience and safe deployment.

Abstract

We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.

Alquist 5.0: Dialogue Trees Meet Generative Models. A Novel Approach for Enhancing SocialBot Conversations

TL;DR

Alquist 5.0 advances socialbot dialogue by integrating Barista, a BlenderBot 3–based Neural Response Generator, into a modular, topic-driven architecture that couples scripted dialogues with generative models. It introduces a suite of novel Barista components (fast classifiers, FiD-inspired knowledge extraction, and a refined query-generation pipeline) plus VicuChat for knowledge-aware responses, all deployed within a robust multimodal UI and APIHub knowledge sources. A combined safety framework fuses fastText classifiers with rule-based checks to improve safety in open-domain chats. The work demonstrates improved conversational quality, reduced repetition, and responsive knowledge access on multimodal devices, contributing practical techniques for safe, engaging SocialBot experiences. The approach offers scalable pathways for integrating LLMs into dialogue management through an LLM loop and hybrid dialogue strategies, with tangible benefits for user experience and safe deployment.

Abstract

We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.
Paper Structure (47 sections, 11 figures, 7 tables)

This paper contains 47 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: The system architecture builds on top of the work proposed by Konrád et al. konrad2021alquist, with the main emphasis being put on the Barista neural response generator.
  • Figure 2: Visualization of Barista.
  • Figure 3: Test result for Barista with Vicuna trained by LoRA, fully fine-tuned, pure Vicuna and BlenderBot 3. The NRGs were rated on the scale 0-5, where 5 stands for the best rating.
  • Figure 4: Results show unsafe content as an Unsafe proportion in % (lower is better) across four datasets - DiaSafety, StereoSet, TweetOffensive, CyberBully for each language model: DialoGPT-Small, DialoGPT-Medium, BlenderBot3-400M, BlenderBot3-3B. The "Overall" was computed as an average of values for a selected language model.
  • Figure 5: Visualization of combined approach.
  • ...and 6 more figures