Table of Contents
Fetching ...

Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework

Spencer Lin, Miru Jun, Basem Rizk, Karen Shieh, Scott Fisher, Sharon Mozgai

TL;DR

The paper presents Estuary, an open-source, multimodal SIA framework designed for low-latency, real-time interaction that can operate off-cloud and on edge devices. Using the Rapid Assessment Process, the authors gather expert feedback from ICT researchers to identify current tool gaps and validate Estuary's design choices, including modular Stage/Pipeline architecture, SocketIO-based data flow, and Unity XR integrations. Key contributions include a detailed comparison of existing frameworks (VHToolkit, Pipecat, NVIDIA ACE), justification for Estuary's off-cloud, interoperable, and platform-agnostic approach, and a roadmap informed by user feedback emphasizing openness, extensibility, and usability. The findings highlight both the strengths of Estuary in meeting contemporary SIA research needs and areas for improvement, such as richer multimodal modalities, simplified setup, and broader platform support, with the work guiding future development of SIA tooling and frameworks.

Abstract

This case study presents our user-centered design model for Socially Intelligent Agent (SIA) development frameworks through our experience developing Estuary, an open source multimodal framework for building low-latency real-time socially interactive agents. We leverage the Rapid Assessment Process (RAP) to collect the thoughts of leading researchers in the field of SIAs regarding the current state of the art for SIA development as well as their evaluation of how well Estuary may potentially address current research gaps. We achieve this through a series of end-user interviews conducted by a fellow researcher in the community. We hope that the findings of our work will not only assist the continued development of Estuary but also guide the development of other future frameworks and technologies for SIAs.

Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework

TL;DR

The paper presents Estuary, an open-source, multimodal SIA framework designed for low-latency, real-time interaction that can operate off-cloud and on edge devices. Using the Rapid Assessment Process, the authors gather expert feedback from ICT researchers to identify current tool gaps and validate Estuary's design choices, including modular Stage/Pipeline architecture, SocketIO-based data flow, and Unity XR integrations. Key contributions include a detailed comparison of existing frameworks (VHToolkit, Pipecat, NVIDIA ACE), justification for Estuary's off-cloud, interoperable, and platform-agnostic approach, and a roadmap informed by user feedback emphasizing openness, extensibility, and usability. The findings highlight both the strengths of Estuary in meeting contemporary SIA research needs and areas for improvement, such as richer multimodal modalities, simplified setup, and broader platform support, with the work guiding future development of SIA tooling and frameworks.

Abstract

This case study presents our user-centered design model for Socially Intelligent Agent (SIA) development frameworks through our experience developing Estuary, an open source multimodal framework for building low-latency real-time socially interactive agents. We leverage the Rapid Assessment Process (RAP) to collect the thoughts of leading researchers in the field of SIAs regarding the current state of the art for SIA development as well as their evaluation of how well Estuary may potentially address current research gaps. We achieve this through a series of end-user interviews conducted by a fellow researcher in the community. We hope that the findings of our work will not only assist the continued development of Estuary but also guide the development of other future frameworks and technologies for SIAs.

Paper Structure

This paper contains 25 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Abstract Diagram of Estuary System Architecture: A client-server based architecture employing $SocketIO$ Protocol, where the data is transferred in $JSON$ format, while it gets parsed and defined according to defined event types. The two boxes on the right illustrate the versatility of Estuary to extend to any type of modalities according to the DataPackets design, as well as to support any model or online service endpoint by extending/configuring with minimal code the appropriate abstract Stage definition.