WEPO: Web Element Preference Optimization for LLM-based Web Navigation
Jiarun Liu, Jia Hao, Chunhong Zhang, Zheng Hu
TL;DR
This work tackles improving autonomous web navigation by exploiting HTML structure through preference learning. It proposes WEPO, a framework that uses unsupervised sampling of non-salient HTML elements as negative samples and trains with Direct Preference Optimization to align model actions with user intent. On the Mind2Web benchmark, WEPO achieves state-of-the-art performance, outperforming baselines such as WebAgent and CogAgent and demonstrating strong generalization across domains and tasks. The results indicate that contrastive, preference-based fine-tuning can substantially enhance web-page-based task execution, with promising future directions including HTML-specific encoders and scalability to longer contexts.
Abstract
The rapid advancement of autonomous web navigation has significantly benefited from grounding pretrained Large Language Models (LLMs) as agents. However, current research has yet to fully leverage the redundancy of HTML elements for contrastive training. This paper introduces a novel approach to LLM-based web navigation tasks, called Web Element Preference Optimization (WEPO). WEPO utilizes unsupervised preference learning by sampling distance-based non-salient web elements as negative samples, optimizing maximum likelihood objective within Direct Preference Optimization (DPO). We evaluate WEPO on the Mind2Web benchmark and empirically demonstrate that WEPO aligns user high-level intent with output actions more effectively. The results show that our method achieved the state-of-the-art, with an improvement of 13.8% over WebAgent and 5.3% over the visual language model CogAgent baseline. Our findings underscore the potential of preference optimization to enhance web navigation and other web page based tasks, suggesting a promising direction for future research.
