Table of Contents
Fetching ...

Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

Jaekyeom Kim, Dong-Ki Kim, Lajanugen Logeswaran, Sungryull Sohn, Honglak Lee

TL;DR

Auto-Intent is introduced, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning, where it empirically focus on web navigation tasks.

Abstract

In this paper, we introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning, where we empirically focus on web navigation tasks. Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly, in a highly compact form (up to three words). With the extracted intents, we train our intent predictor to predict the next intent given the agent's past observations and actions. In particular, we propose a self-exploration approach where top-k probable intent predictions are provided as a hint to the pre-trained LLM agent, which leads to enhanced decision-making capabilities. Auto-Intent substantially improves the performance of GPT-{3.5, 4} and Llama-3.1-{70B, 405B} agents on the large-scale real-website navigation benchmarks from Mind2Web and online navigation tasks from WebArena with its cross-benchmark generalization from Mind2Web.

Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

TL;DR

Auto-Intent is introduced, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning, where it empirically focus on web navigation tasks.

Abstract

In this paper, we introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning, where we empirically focus on web navigation tasks. Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly, in a highly compact form (up to three words). With the extracted intents, we train our intent predictor to predict the next intent given the agent's past observations and actions. In particular, we propose a self-exploration approach where top-k probable intent predictions are provided as a hint to the pre-trained LLM agent, which leads to enhanced decision-making capabilities. Auto-Intent substantially improves the performance of GPT-{3.5, 4} and Llama-3.1-{70B, 405B} agents on the large-scale real-website navigation benchmarks from Mind2Web and online navigation tasks from WebArena with its cross-benchmark generalization from Mind2Web.

Paper Structure

This paper contains 28 sections, 3 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of Auto-Intent: Given a dataset of demonstration trajectories, we first extract natural language intents in an unsupervised manner and train an intent predictor. Enforcing the intents to be concise phrases and providing top-$k$ intent predictions as hints to an LLM agent allows efficient internal exploration of semantically diverse intent hypotheses, resulting in improved action prediction. See text for details.
  • Figure 2: A hard example of intent discovery: the action (CLICK <svg id=5 />) does not provide any semantics about the intent. Our intent extractor successfully discovers the underlying intent by thoroughly understanding the context and connecting to the relevant parts.
  • Figure 3: The intent label recalls with respect to top-$k$ predicted intents on Mind2Web's test sets ($N=20$).
  • Figure 4: The prompt for our intent extractor. We show only one in-context example due to the space limit.
  • Figure 5: The prompt for our LLM policy with predicted intents. We show only one in-context example due to the space limit.