LEXI: Large Language Models Experimentation Interface

Guy Laban; Tomer Laban; Hatice Gunes

LEXI: Large Language Models Experimentation Interface

Guy Laban, Tomer Laban, Hatice Gunes

TL;DR

LEXI addresses a critical gap in HAI research by providing an open-source GUI platform for deploying and controlling LLM-powered agents in social interaction experiments. It enables standardized experiment management, agent construction, and integrated data collection (logs and questionnaires), enhancing reproducibility and cross-disciplinary access. A proof-of-concept study shows empathetic agents are perceived as more social and capable, with positive effects on user mood and engagement, supporting the platform’s validity for behavioral research. The work emphasizes open science and ethical considerations, and outlines a roadmap for future features to broaden methodological control and model diversity, enabling deeper investigations into human-agent interactions.

Abstract

The recent developments in Large Language Models (LLM), mark a significant moment in the research and development of social interactions with artificial agents. These agents are widely deployed in a variety of settings, with potential impact on users. However, the study of social interactions with agents powered by LLM is still emerging, limited by access to the technology and to data, the absence of standardised interfaces, and challenges to establishing controlled experimental setups using the currently available business-oriented platforms. To answer these gaps, we developed LEXI, LLMs Experimentation Interface, an open-source tool enabling the deployment of artificial agents powered by LLM in social interaction behavioural experiments. Using a graphical interface, LEXI allows researchers to build agents, and deploy them in experimental setups along with forms and questionnaires while collecting interaction logs and self-reported data. The outcomes of usability testing indicate LEXI's broad utility, high usability and minimum mental workload requirement, with distinctive benefits observed across disciplines. A proof-of-concept study exploring the tool's efficacy in evaluating social HAIs was conducted, resulting in high-quality data. A comparison of empathetic versus neutral agents indicated that people perceive empathetic agents as more social, and write longer and more positive messages towards them.

LEXI: Large Language Models Experimentation Interface

TL;DR

Abstract

Paper Structure (20 sections, 5 figures, 1 table)

This paper contains 20 sections, 5 figures, 1 table.

Introduction
The State of the art
Gaps and Goals
The Current Tool
Researcher side
Accessibility and Open-Source
Experiment Management
Building and Managing Agents
Building Forms and Questionnaires
Participant Side
Registration and screening data
Interaction
Usability Testing
Methods
Quantitative Results
...and 5 more sections

Figures (5)

Figure 1: An interface map that explains the 'Admin Dashboard' of LEXI, detailing the interface and functionalities available to researchers for managing experiments, agents, and forms. Yellow boxes indicate information communicated on these pages.
Figure 2: Left to right: (1) The 'Experiment Management' page when adding a new experiment. (2) The 'Agents Management' page when adding a new agent.
Figure 3: The 'Forms Management' section, existing forms are displayed on the left, the form currently being edited by the researcher is in the middle, and editing fields are on the right
Figure 4: In the Interaction page, participants can change the font size on the left. They can communicate with the agent through the text bar at the bottom, annotate the agent's messages using the like/dislike buttons, and conclude the interaction by pressing the 'Finish' button on the left.
Figure 5: Left to right: (1) Tasks' Mean Mental Workload by Participants' Research Background. (2) Tasks' Duration by Participants' Research Background

LEXI: Large Language Models Experimentation Interface

TL;DR

Abstract

LEXI: Large Language Models Experimentation Interface

Authors

TL;DR

Abstract

Table of Contents

Figures (5)