Open Assistant Toolkit -- version 2
Sophie Fischer, Federico Rossetto, Carlos Gemmell, Andrew Ramsay, Iain Mackie, Philip Zubel, Niklas Tecklenburg, Jeffrey Dalton
TL;DR
Open Assistant Toolkit v2 introduces a scalable, open-source framework for building task-oriented conversational agents that integrate modular action generation, knowledge-grounded response generation, and live task adaptation. It provides online, offline, and training pipelines, with a Neural Decision Parser (NDP) for constrained action generation and an LLM backbone accessible through Huggingface's Text Generation Interface, all deployed in a Dockerised, low-latency architecture. A dedicated offline pipeline builds TaskGraphs from Common Crawl data, along with synthetic task generation and a training pipeline for specialised models, enabling rapid domain expansion to real-world multimodal tasks. The work offers deployment-ready tools to research and commercialize multimodal virtual assistants, and it anticipates future integration with vision-language models and AR-enabled interactions to broaden practical impact.
Abstract
We present the second version of the Open Assistant Toolkit (OAT-v2), an open-source task-oriented conversational system for composing generative neural models. OAT-v2 is a scalable and flexible assistant platform supporting multiple domains and modalities of user interaction. It splits processing a user utterance into modular system components, including submodules such as action code generation, multimodal content retrieval, and knowledge-augmented response generation. Developed over multiple years of the Alexa TaskBot challenge, OAT-v2 is a proven system that enables scalable and robust experimentation in experimental and real-world deployment. OAT-v2 provides open models and software for research and commercial applications to enable the future of multimodal virtual assistants across diverse applications and types of rich interaction.
