Table of Contents
Fetching ...

PAFFA: Premeditated Actions For Fast Agents

Shambhavi Krishna, Zheng Chen, Yuan Ling, Xiaojiang Huang, Yingjie Li, Fan Yang, Xiang Li

TL;DR

PAFFA addresses the inefficiency and brittleness of per-step LLM-based web navigation by introducing an Action Library of pre-computed, reusable interaction APIs. It offers two zero-shot construction strategies, Dist-Map and Unravel, and an inference-time path that maps natural language requests to pre-computed actions, cutting runtime token usage by about 87%. The Unravel component enables runtime adaptation to novel tasks and websites, capturing successful traces to evolve the library without retraining the base LLM. The approach achieves competitive Mind2Web performance, enhanced generalization across sites, and clear practical impact for scalable, internet-scale AI agents.

Abstract

Modern AI assistants have made significant progress in natural language understanding and tool-use, with emerging efforts to interact with Web interfaces. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. We introduce PAFFA (Premeditated Actions For Fast Agents), a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique that requires no task-specific training. PAFFA constructs an 'Action Library', leveraging the parametric knowledge of the base LLM to pre-compute browser interaction patterns that generalize across tasks. By strategically re-using LLM inference across tasks - either via 'Dist-Map' for task-agnostic identification of key interactive web elements, or 'Unravel' for first-encounter, stateful exploration of novel tasks/sites) - PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance (achieving 0.57 vs. 0.50 step accuracy compared to baseline). Further, Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites. In sum, this work exhibits that LLM reasoning sequences can generalize across prompts, offering a way to scale inference-time techniques for internet-scale data with sublinear token count.

PAFFA: Premeditated Actions For Fast Agents

TL;DR

PAFFA addresses the inefficiency and brittleness of per-step LLM-based web navigation by introducing an Action Library of pre-computed, reusable interaction APIs. It offers two zero-shot construction strategies, Dist-Map and Unravel, and an inference-time path that maps natural language requests to pre-computed actions, cutting runtime token usage by about 87%. The Unravel component enables runtime adaptation to novel tasks and websites, capturing successful traces to evolve the library without retraining the base LLM. The approach achieves competitive Mind2Web performance, enhanced generalization across sites, and clear practical impact for scalable, internet-scale AI agents.

Abstract

Modern AI assistants have made significant progress in natural language understanding and tool-use, with emerging efforts to interact with Web interfaces. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. We introduce PAFFA (Premeditated Actions For Fast Agents), a method that makes LLMs faster and more accurate in completing tasks on the internet using a novel inference-time technique that requires no task-specific training. PAFFA constructs an 'Action Library', leveraging the parametric knowledge of the base LLM to pre-compute browser interaction patterns that generalize across tasks. By strategically re-using LLM inference across tasks - either via 'Dist-Map' for task-agnostic identification of key interactive web elements, or 'Unravel' for first-encounter, stateful exploration of novel tasks/sites) - PAFFA drastically reduces inference time tokens by 87% while maintaining robust performance (achieving 0.57 vs. 0.50 step accuracy compared to baseline). Further, Unravel's ability to update its action library based on explorations allows generalization and adaptation to unseen websites. In sum, this work exhibits that LLM reasoning sequences can generalize across prompts, offering a way to scale inference-time techniques for internet-scale data with sublinear token count.

Paper Structure

This paper contains 24 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Creating task-specific scripts.
  • Figure 2: Grouping tasks solvable by one API.
  • Figure 3: Creating APIs per group.
  • Figure 4: Common workflow of existing solutions like MindAct deng2023mind2webgeneralistagentweb.
  • Figure 5: Using Action Library.