Table of Contents
Fetching ...

Asynchronous Tool Usage for Real-Time Agents

Antonio A. Ginart, Naveen Kodali, Jason Lee, Caiming Xiong, Silvio Savarese, John Emmons

TL;DR

This work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions, and an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech.

Abstract

While frontier large language models (LLMs) are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially, preventing the systems from multitasking and reducing interactivity. To address this limitation, we introduce asynchronous AI agents capable of parallel processing and real-time tool-use. Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech. Drawing inspiration from the concepts originally developed for real-time operating systems, this work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.

Asynchronous Tool Usage for Real-Time Agents

TL;DR

This work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions, and an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech.

Abstract

While frontier large language models (LLMs) are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially, preventing the systems from multitasking and reducing interactivity. To address this limitation, we introduce asynchronous AI agents capable of parallel processing and real-time tool-use. Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech. Drawing inspiration from the concepts originally developed for real-time operating systems, this work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.

Paper Structure

This paper contains 28 sections, 5 equations, 3 figures.

Figures (3)

  • Figure 1: Hypothetical voice call between a user and a travel agent with asynchronous tool use.
  • Figure 2: Architecture diagram for the asynchronous execution environment with voice peripherals
  • Figure 3: Implementation and control flow diagram for the asynchronous execution environment