Table of Contents
Fetching ...

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

Yuxuan Lu, Ting-Yao Hsu, Hansu Gu, Limeng Cui, Yaochen Xie, William Headden, Bingsheng Yao, Akash Veeragouni, Jiapeng Liu, Sreyashi Nag, Jessie Wang, Dakuo Wang

TL;DR

This work presents AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages, and suggests AgentA/B can emulate human-like behavior patterns.

Abstract

A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications. Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result. Through formative interviews with six experienced industry practitioners, we identified critical bottlenecks in current A/B testing workflows. In response, we present AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages. AgentA/B enables scalable deployment of LLM agents with diverse personas, each capable of navigating the dynamic webpage and interactively executing multi-step interactions like search, clicking, filtering, and purchasing. In a demonstrative controlled experiment, we employ AgentA/B to simulate a between-subject A/B testing with 1,000 LLM agents Amazon.com, and compare agent behaviors with real human shopping behaviors at a scale. Our findings suggest AgentA/B can emulate human-like behavior patterns.

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

TL;DR

This work presents AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages, and suggests AgentA/B can emulate human-like behavior patterns.

Abstract

A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications. Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result. Through formative interviews with six experienced industry practitioners, we identified critical bottlenecks in current A/B testing workflows. In response, we present AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages. AgentA/B enables scalable deployment of LLM agents with diverse personas, each capable of navigating the dynamic webpage and interactively executing multi-step interactions like search, clicking, filtering, and purchasing. In a demonstrative controlled experiment, we employ AgentA/B to simulate a between-subject A/B testing with 1,000 LLM agents Amazon.com, and compare agent behaviors with real human shopping behaviors at a scale. Our findings suggest AgentA/B can emulate human-like behavior patterns.

Paper Structure

This paper contains 29 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The workflow of web A/B testing and three challenges reported from formative study: (1) the cost and difficulty of securing considerable user traffic for significant results, (2) the whole A/B testing period can span across weeks and months, and (3) limited testing opportunites.
  • Figure 2: One action prediction iteration of the automated web testing in Agent A/B with an LLM agent. (1) An Agent Profiling Module maintains a comprehensive agent description with an LLM-generated persona, user-specified intention, and the action history of the current session. In the meantime, (2) the Environment Parsing Module parses the webpage into structured web representation and action spaces. (3) All the information is fed into the LLM Agent for the next action prediction, (4) which will be executed by the Action Execution Module in the web environment to drive the next iteration. The light-weight Environment Parsing Module and Action Execution Module are web-site specific by their nature, and other modules include LLM Agent are generalizable to different websites.
  • Figure 3: UXAgent Architecture design luUXAgentLLMAgentBased2025a.
  • Figure 4: Number of LLM agents who completed a purchase under control and treatment conditions.
  • Figure 5: Comparison of average customer spending across conditions, broken down by gender (a) and by age groups (b).
  • ...and 1 more figures