Table of Contents
Fetching ...

The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models

Alexandra Hotti, Riccardo Sven Risuleo, Stefan Magureanu, Aref Moradi, Jens Lagergren

TL;DR

This work empirically benchmark a range of Graph Neural Networks (GNNs) on the web element nomination task and finds that a simple Convolutional GNN (GCN) outperforms complex state-of-the-art nomination methods.

Abstract

Web automation holds the potential to revolutionize how users interact with the digital world, offering unparalleled assistance and simplifying tasks via sophisticated computational methods. Central to this evolution is the web element nomination task, which entails identifying unique elements on webpages. Unfortunately, the development of algorithmic designs for web automation is hampered by the scarcity of comprehensive and realistic datasets that reflect the complexity faced by real-world applications on the Web. To address this, we introduce the Klarna Product Page Dataset, a comprehensive and diverse collection of webpages that surpasses existing datasets in richness and variety. The dataset features 51,701 manually labeled product pages from 8,175 e-commerce websites across eight geographic regions, accompanied by a dataset of rendered page screenshots. To initiate research on the Klarna Product Page Dataset, we empirically benchmark a range of Graph Neural Networks (GNNs) on the web element nomination task. We make three important contributions. First, we found that a simple Convolutional GNN (GCN) outperforms complex state-of-the-art nomination methods. Second, we introduce a training refinement procedure that involves identifying a small number of relevant elements from each page using the aforementioned GCN. These elements are then passed to a large language model for the final nomination. This procedure significantly improves the nomination accuracy by 16.8 percentage points on our challenging dataset, without any need for fine-tuning. Finally, in response to another prevalent challenge in this field - the abundance of training methodologies suitable for element nomination - we introduce the Challenge Nomination Training Procedure, a novel training approach that further boosts nomination accuracy.

The Klarna Product Page Dataset: Web Element Nomination with Graph Neural Networks and Large Language Models

TL;DR

This work empirically benchmark a range of Graph Neural Networks (GNNs) on the web element nomination task and finds that a simple Convolutional GNN (GCN) outperforms complex state-of-the-art nomination methods.

Abstract

Web automation holds the potential to revolutionize how users interact with the digital world, offering unparalleled assistance and simplifying tasks via sophisticated computational methods. Central to this evolution is the web element nomination task, which entails identifying unique elements on webpages. Unfortunately, the development of algorithmic designs for web automation is hampered by the scarcity of comprehensive and realistic datasets that reflect the complexity faced by real-world applications on the Web. To address this, we introduce the Klarna Product Page Dataset, a comprehensive and diverse collection of webpages that surpasses existing datasets in richness and variety. The dataset features 51,701 manually labeled product pages from 8,175 e-commerce websites across eight geographic regions, accompanied by a dataset of rendered page screenshots. To initiate research on the Klarna Product Page Dataset, we empirically benchmark a range of Graph Neural Networks (GNNs) on the web element nomination task. We make three important contributions. First, we found that a simple Convolutional GNN (GCN) outperforms complex state-of-the-art nomination methods. Second, we introduce a training refinement procedure that involves identifying a small number of relevant elements from each page using the aforementioned GCN. These elements are then passed to a large language model for the final nomination. This procedure significantly improves the nomination accuracy by 16.8 percentage points on our challenging dataset, without any need for fine-tuning. Finally, in response to another prevalent challenge in this field - the abundance of training methodologies suitable for element nomination - we introduce the Challenge Nomination Training Procedure, a novel training approach that further boosts nomination accuracy.

Paper Structure

This paper contains 14 sections, 6 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Four pages from the Finnish, Dutch and US markets in our dataset.
  • Figure 2: A page in our dataset. The labeled elements are surrounded by red boxes: (1) buy button, (2) cart button, (3) image, (4) price, (5) name.
  • Figure 3: Difference between nomination and classification: To the left, a DOM-tree representation of a webpage is depicted. During the classification process, the Graph Neural Network embedder takes a node and its context, here being its directly neighboring nodes, as input. The output from the embedder is then fed into a classification layer that produces classification scores. In element nomination, the model is applied to every node in the tree. For each label type, the resulting classification scores are ranked across all nodes, and the node receiving the highest rank is nominated for that specific label type. For example, here we see how the node receiving the highest ranking for the Buy Button label is nominated as the Buy Button element of the page.
  • Figure 4: Substantially Enhanced Nomination Accuracy by Combining GCN-Mean with GPT-4: The performance of the GCN-Mean algorithm alone versus its performance when augmented with a GPT-4-based refinement step. In the enhanced scenario, the GCN-Mean algorithm initially filters the dataset to retain only the top $10$ elements with the highest classification scores on each page. Subsequently, GPT-4 undertakes the final nomination step, based on the local HTML content of these selected elements. This refinement step significantly boosts nomination accuracy across all tasks, resulting in an average increase of $16.82$ percentage points.
  • Figure 5: Effect on average validation error from performing the augmentation step at different times, consistently and not at all for FCN (left) & GCN-Mean (right).
  • ...and 1 more figures