Table of Contents
Fetching ...

LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents

Jinzhe Tan, Karim Benyekhlef

TL;DR

Outlines the justice gap and proposes LegalWebAgent, a multimodal, three-module web agent that plans, navigates, and acts to obtain legal information and complete procedures. The framework leverages HTML and visual inputs to autonomously browse pages and perform actions, validated on 15 Québec civil-law tasks with up to 86.7% peak success and ~84% average success. Findings show strong capabilities in information retrieval and form submission, but reveal challenges in deep navigation and complex multi-step interactions, especially for some models. The work demonstrates the potential to empower laypeople in accessing legal services while highlighting design, efficiency, and safety considerations for deploying AI-powered web agents.

Abstract

Access to justice remains a global challenge, with many citizens still finding it difficult to seek help from the justice system when facing legal issues. Although the internet provides abundant legal information and services, navigating complex websites, understanding legal terminology, and filling out procedural forms continue to pose barriers to accessing justice. This paper introduces the LegalWebAgent framework that employs a web agent powered by multimodal large language models to bridge the gap in access to justice for ordinary citizens. The framework combines the natural language understanding capabilities of large language models with multimodal perception, enabling a complete process from user query to concrete action. It operates in three stages: the Ask Module understands user needs through natural language processing; the Browse Module autonomously navigates webpages, interacts with page elements (including forms and calendars), and extracts information from HTML structures and webpage screenshots; the Act Module synthesizes information for users or performs direct actions like form completion and schedule booking. To evaluate its effectiveness, we designed a benchmark test covering 15 real-world tasks, simulating typical legal service processes relevant to Québec civil law users, from problem identification to procedural operations. Evaluation results show LegalWebAgent achieved a peak success rate of 86.7%, with an average of 84.4% across all tested models, demonstrating high autonomy in complex real-world scenarios.

LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents

TL;DR

Outlines the justice gap and proposes LegalWebAgent, a multimodal, three-module web agent that plans, navigates, and acts to obtain legal information and complete procedures. The framework leverages HTML and visual inputs to autonomously browse pages and perform actions, validated on 15 Québec civil-law tasks with up to 86.7% peak success and ~84% average success. Findings show strong capabilities in information retrieval and form submission, but reveal challenges in deep navigation and complex multi-step interactions, especially for some models. The work demonstrates the potential to empower laypeople in accessing legal services while highlighting design, efficiency, and safety considerations for deploying AI-powered web agents.

Abstract

Access to justice remains a global challenge, with many citizens still finding it difficult to seek help from the justice system when facing legal issues. Although the internet provides abundant legal information and services, navigating complex websites, understanding legal terminology, and filling out procedural forms continue to pose barriers to accessing justice. This paper introduces the LegalWebAgent framework that employs a web agent powered by multimodal large language models to bridge the gap in access to justice for ordinary citizens. The framework combines the natural language understanding capabilities of large language models with multimodal perception, enabling a complete process from user query to concrete action. It operates in three stages: the Ask Module understands user needs through natural language processing; the Browse Module autonomously navigates webpages, interacts with page elements (including forms and calendars), and extracts information from HTML structures and webpage screenshots; the Act Module synthesizes information for users or performs direct actions like form completion and schedule booking. To evaluate its effectiveness, we designed a benchmark test covering 15 real-world tasks, simulating typical legal service processes relevant to Québec civil law users, from problem identification to procedural operations. Evaluation results show LegalWebAgent achieved a peak success rate of 86.7%, with an average of 84.4% across all tested models, demonstrating high autonomy in complex real-world scenarios.

Paper Structure

This paper contains 19 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The overall workflow of LegalWebAgent. Given a user's query, LegalWebAgent formulates a plan, analyzes the webpage’s HTML elements and screenshots, and determines the appropriate actions (such as clicks, scrolls, or inputs). After gathering the necessary information or completing the requested task, it generates a concise summary of the process and presents the results to the user.
  • Figure 2: Example of a webpage screenshot provided to the web agent. By reading the HTML file, the application automatically adds borders to interactive elements on the webpage and labels them with numbered tags in the upper-right corner.
  • Figure 3: A demonstration of how LegalWebAgent navigates the website and completes an online form task through a series of actions on the legal-agent-sandbox we created.
  • Figure 4: Overall performance comparison and task-wise success rates of the evaluated models.