Table of Contents
Fetching ...

Estuary: A Framework For Building Multimodal Low-Latency Real-Time Socially Interactive Agents

Spencer Lin, Basem Rizk, Miru Jun, Andy Artze, Caitlin Sullivan, Sharon Mozgai, Scott Fisher

TL;DR

Estuary is developed: a multimodal framework which facilitates the development of low-latency, real-time SIAs by constructing a robust multimodal framework which incorporates current and future components seamlessly into a modular and interoperable architecture.

Abstract

The rise in capability and ubiquity of generative artificial intelligence (AI) technologies has enabled its application to the field of Socially Interactive Agents (SIAs). Despite rising interest in modern AI-powered components used for real-time SIA research, substantial friction remains due to the absence of a standardized and universal SIA framework. To target this absence, we developed Estuary: a multimodal (text, audio, and soon video) framework which facilitates the development of low-latency, real-time SIAs. Estuary seeks to reduce repeat work between studies and to provide a flexible platform that can be run entirely off-cloud to maximize configurability, controllability, reproducibility of studies, and speed of agent response times. We are able to do this by constructing a robust multimodal framework which incorporates current and future components seamlessly into a modular and interoperable architecture.

Estuary: A Framework For Building Multimodal Low-Latency Real-Time Socially Interactive Agents

TL;DR

Estuary is developed: a multimodal framework which facilitates the development of low-latency, real-time SIAs by constructing a robust multimodal framework which incorporates current and future components seamlessly into a modular and interoperable architecture.

Abstract

The rise in capability and ubiquity of generative artificial intelligence (AI) technologies has enabled its application to the field of Socially Interactive Agents (SIAs). Despite rising interest in modern AI-powered components used for real-time SIA research, substantial friction remains due to the absence of a standardized and universal SIA framework. To target this absence, we developed Estuary: a multimodal (text, audio, and soon video) framework which facilitates the development of low-latency, real-time SIAs. Estuary seeks to reduce repeat work between studies and to provide a flexible platform that can be run entirely off-cloud to maximize configurability, controllability, reproducibility of studies, and speed of agent response times. We are able to do this by constructing a robust multimodal framework which incorporates current and future components seamlessly into a modular and interoperable architecture.

Paper Structure

This paper contains 11 sections, 3 figures.

Figures (3)

  • Figure 1: Estuary's various capabilities in Augmented Reality and Virtual Reality environments.
  • Figure 2: Estuary leveraging mesh classification capabilities into multimodal interaction.
  • Figure 3: System diagram of the server and client