Table of Contents
Fetching ...

Open Assistant Toolkit -- version 2

Sophie Fischer, Federico Rossetto, Carlos Gemmell, Andrew Ramsay, Iain Mackie, Philip Zubel, Niklas Tecklenburg, Jeffrey Dalton

TL;DR

Open Assistant Toolkit v2 introduces a scalable, open-source framework for building task-oriented conversational agents that integrate modular action generation, knowledge-grounded response generation, and live task adaptation. It provides online, offline, and training pipelines, with a Neural Decision Parser (NDP) for constrained action generation and an LLM backbone accessible through Huggingface's Text Generation Interface, all deployed in a Dockerised, low-latency architecture. A dedicated offline pipeline builds TaskGraphs from Common Crawl data, along with synthetic task generation and a training pipeline for specialised models, enabling rapid domain expansion to real-world multimodal tasks. The work offers deployment-ready tools to research and commercialize multimodal virtual assistants, and it anticipates future integration with vision-language models and AR-enabled interactions to broaden practical impact.

Abstract

We present the second version of the Open Assistant Toolkit (OAT-v2), an open-source task-oriented conversational system for composing generative neural models. OAT-v2 is a scalable and flexible assistant platform supporting multiple domains and modalities of user interaction. It splits processing a user utterance into modular system components, including submodules such as action code generation, multimodal content retrieval, and knowledge-augmented response generation. Developed over multiple years of the Alexa TaskBot challenge, OAT-v2 is a proven system that enables scalable and robust experimentation in experimental and real-world deployment. OAT-v2 provides open models and software for research and commercial applications to enable the future of multimodal virtual assistants across diverse applications and types of rich interaction.

Open Assistant Toolkit -- version 2

TL;DR

Open Assistant Toolkit v2 introduces a scalable, open-source framework for building task-oriented conversational agents that integrate modular action generation, knowledge-grounded response generation, and live task adaptation. It provides online, offline, and training pipelines, with a Neural Decision Parser (NDP) for constrained action generation and an LLM backbone accessible through Huggingface's Text Generation Interface, all deployed in a Dockerised, low-latency architecture. A dedicated offline pipeline builds TaskGraphs from Common Crawl data, along with synthetic task generation and a training pipeline for specialised models, enabling rapid domain expansion to real-world multimodal tasks. The work offers deployment-ready tools to research and commercialize multimodal virtual assistants, and it anticipates future integration with vision-language models and AR-enabled interactions to broaden practical impact.

Abstract

We present the second version of the Open Assistant Toolkit (OAT-v2), an open-source task-oriented conversational system for composing generative neural models. OAT-v2 is a scalable and flexible assistant platform supporting multiple domains and modalities of user interaction. It splits processing a user utterance into modular system components, including submodules such as action code generation, multimodal content retrieval, and knowledge-augmented response generation. Developed over multiple years of the Alexa TaskBot challenge, OAT-v2 is a proven system that enables scalable and robust experimentation in experimental and real-world deployment. OAT-v2 provides open models and software for research and commercial applications to enable the future of multimodal virtual assistants across diverse applications and types of rich interaction.
Paper Structure (13 sections, 5 figures, 1 table)

This paper contains 13 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: OAT-v2 processes user utterances with different submodules. The system generates a system action based on the current task, system state and user utterance. Then, a knowledge-augmented generator generates a response based on the task and external knowledge. The system internally updates the task and responds to the user.
  • Figure 2: OAT-v2 code base is structured into online, offline and deployment Docker containers. Coloured components will show up in the logs when running the online system.
  • Figure 3: NDP architecture. The encoder embeds possible functions depending on the task state and the previous system prompt. The decoder prefix includes the user turn. The decoder then auto-regressively generates a system action in the action target space.
  • Figure 4: OAT-v2's composed system for response generation depends on NDP action generation. Depending on the action type, the Orchestrator component handles calling different functionalities such as specialised models, LLMs, or predefined logic flows. If the NDP action code is unknown, a fallback LLM handles generating a fluent response and communicating system abilities to the user.
  • Figure 5: OAT-v2's offline pipeline ingests multimodal input from Common Crawl and parses them into subcomponents. Then, the pipeline transforms the data into augmented TaskGraphs. Other Corpora include categories and knowledge corpora and are written out for further online system use.