Table of Contents
Fetching ...

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

Danqing Zhang, Balaji Rama, Jingyi Ni, Shiying He, Fu Zhao, Kunyu Chen, Arnold Chen, Junyu Cao

TL;DR

LiteWebAgent tackles the lack of production-ready, open-source tools for VLM-based web agents by decoupling action generation from grounding, integrating planning and memory, and adding tree search to enable multi-trajectory exploration. It delivers a flexible, serverless-friendly framework with asynchronous FastAPI APIs and two deployment formats: a Vercel-hosted full-stack web app and a Chrome extension using CDP. The work contributes a concrete agent framework, memory-informed planning, and a tree-search extension, along with practical implementations such as a replay module and VLM-based grounding functions, demonstrated through full-stack and Chrome-extension deployments. The practical impact lies in providing developers and researchers with an extensible, production-oriented platform to build, test, and deploy VLM-driven web agents with end-to-end browser control and real-time visualization.

Abstract

We introduce LiteWebAgent, an open-source suite for VLM-based web agent applications. Our framework addresses a critical gap in the web agent ecosystem with a production-ready solution that combines minimal serverless backend configuration, intuitive user and browser interfaces, and extensible research capabilities in agent planning, memory, and tree search. For the core LiteWebAgent agent framework, we implemented a simple yet effective baseline using recursive function calling, providing with decoupled action generation and action grounding. In addition, we integrate advanced research components such as agent planning, agent workflow memory, and tree search in a modular and extensible manner. We then integrate the LiteWebAgent agent framework with frontend and backend as deployed systems in two formats: (1) a production Vercel-based web application, which provides users with an agent-controlled remote browser, (2) a Chrome extension leveraging LiteWebAgent's API to control an existing Chrome browser via CDP (Chrome DevTools Protocol). The LiteWebAgent framework is available at https://github.com/PathOnAI/LiteWebAgent, with deployed frontend at https://lite-web-agent.vercel.app/.

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

TL;DR

LiteWebAgent tackles the lack of production-ready, open-source tools for VLM-based web agents by decoupling action generation from grounding, integrating planning and memory, and adding tree search to enable multi-trajectory exploration. It delivers a flexible, serverless-friendly framework with asynchronous FastAPI APIs and two deployment formats: a Vercel-hosted full-stack web app and a Chrome extension using CDP. The work contributes a concrete agent framework, memory-informed planning, and a tree-search extension, along with practical implementations such as a replay module and VLM-based grounding functions, demonstrated through full-stack and Chrome-extension deployments. The practical impact lies in providing developers and researchers with an extensible, production-oriented platform to build, test, and deploy VLM-driven web agents with end-to-end browser control and real-time visualization.

Abstract

We introduce LiteWebAgent, an open-source suite for VLM-based web agent applications. Our framework addresses a critical gap in the web agent ecosystem with a production-ready solution that combines minimal serverless backend configuration, intuitive user and browser interfaces, and extensible research capabilities in agent planning, memory, and tree search. For the core LiteWebAgent agent framework, we implemented a simple yet effective baseline using recursive function calling, providing with decoupled action generation and action grounding. In addition, we integrate advanced research components such as agent planning, agent workflow memory, and tree search in a modular and extensible manner. We then integrate the LiteWebAgent agent framework with frontend and backend as deployed systems in two formats: (1) a production Vercel-based web application, which provides users with an agent-controlled remote browser, (2) a Chrome extension leveraging LiteWebAgent's API to control an existing Chrome browser via CDP (Chrome DevTools Protocol). The LiteWebAgent framework is available at https://github.com/PathOnAI/LiteWebAgent, with deployed frontend at https://lite-web-agent.vercel.app/.

Paper Structure

This paper contains 22 sections, 4 figures.

Figures (4)

  • Figure 1: Agent workflow
  • Figure 2: System Design: High Level Overview
  • Figure 3: Screenshot of frontend UI
  • Figure 4: Screenshot of Chrome extension UI