Optimizing SIA Development: A Case Study in User-Centered Design for Estuary, a Multimodal Socially Interactive Agent Framework
Spencer Lin, Miru Jun, Basem Rizk, Karen Shieh, Scott Fisher, Sharon Mozgai
TL;DR
The paper presents Estuary, an open-source, multimodal SIA framework designed for low-latency, real-time interaction that can operate off-cloud and on edge devices. Using the Rapid Assessment Process, the authors gather expert feedback from ICT researchers to identify current tool gaps and validate Estuary's design choices, including modular Stage/Pipeline architecture, SocketIO-based data flow, and Unity XR integrations. Key contributions include a detailed comparison of existing frameworks (VHToolkit, Pipecat, NVIDIA ACE), justification for Estuary's off-cloud, interoperable, and platform-agnostic approach, and a roadmap informed by user feedback emphasizing openness, extensibility, and usability. The findings highlight both the strengths of Estuary in meeting contemporary SIA research needs and areas for improvement, such as richer multimodal modalities, simplified setup, and broader platform support, with the work guiding future development of SIA tooling and frameworks.
Abstract
This case study presents our user-centered design model for Socially Intelligent Agent (SIA) development frameworks through our experience developing Estuary, an open source multimodal framework for building low-latency real-time socially interactive agents. We leverage the Rapid Assessment Process (RAP) to collect the thoughts of leading researchers in the field of SIAs regarding the current state of the art for SIA development as well as their evaluation of how well Estuary may potentially address current research gaps. We achieve this through a series of end-user interviews conducted by a fellow researcher in the community. We hope that the findings of our work will not only assist the continued development of Estuary but also guide the development of other future frameworks and technologies for SIAs.
