ECHO: An Open Research Platform for Evaluation of Chat, Human Behavior, and Outcomes
Jiqun Liu, Nischal Dinesh, Ran Yu
TL;DR
ECHO addresses the need for cohesive, longitudinal evaluation of modern information-access systems by offering an open, low-code platform that supports end-to-end mixed-method studies of chat and Web search interactions. The system enables administrators to configure studies via a dashboard, while participants engage in consent, surveys, and task sessions using either a chat interface powered by LLM APIs or a search interface powered by a Search API, with all interactions logged and exportable. Key contributions include a serverless, dual-modality architecture; flexible API integration; in-situ surveys; and comprehensive data export, all designed to lower barriers for cross-disciplinary, reproducible human-centered AI research. The platform facilitates trajectory-level analyses of user experience, learning, and judgment formation, enabling scalable studies across IR, HCI, cognitive and social sciences with practical impact for evaluating and improving information-access technologies.
Abstract
ECHO (Evaluation of Chat, Human behavior, and Outcomes) is an open research platform designed to support reproducible, mixed-method studies of human interaction with both conversational AI systems and Web search engines. It enables researchers from varying disciplines to orchestrate end-to-end experimental workflows that integrate consent and background surveys, chat-based and search-based information-seeking sessions, writing or judgment tasks, and pre- and post-task evaluations within a unified, low-coding-load framework. ECHO logs fine-grained interaction traces and participant responses, and exports structured datasets for downstream analysis. By supporting both chat and search alongside flexible evaluation instruments, ECHO lowers technical barriers for studying learning, decision making, and user experience across different information access paradigms, empowering researchers from information retrieval, HCI, and the social sciences to conduct scalable and reproducible human-centered AI evaluations.
