Table of Contents
Fetching ...

An Empirical Comparison of Generative Approaches for Product Attribute-Value Identification

Kassem Sabeh, Robert Litschko, Mouna Kacimi, Barbara Plank, Johann Gamper

TL;DR

The paper tackles the open-world task of Product Attribute and Value Identification (PAVI) by framing it as a generation problem and comparing three AVG strategies—Pipeline AVG, Multitask AVG, and End2End AVG—built on fine-tuned encoder–decoder models such as T5 and BART. It conducts a comprehensive evaluation on three real-world datasets (AE-110K, OA-Mine, MAVE), finding that End2End AVG generally yields the strongest F1 scores, with model size and dataset affecting relative performance; ensemble methods further boost recall. The study highlights cross-dataset generalization challenges due to differing attribute vocabularies and domains, and provides detailed cost analyses and practical guidance for deployment. By releasing code and data splits, it offers a solid benchmark for generation-based PAVI and informs design choices for production systems in search and recommendations.

Abstract

Product attributes are crucial for e-commerce platforms, supporting applications like search, recommendation, and question answering. The task of Product Attribute and Value Identification (PAVI) involves identifying both attributes and their values from product information. In this paper, we formulate PAVI as a generation task and provide, to the best of our knowledge, the most comprehensive evaluation of PAVI so far. We compare three different attribute-value generation (AVG) strategies based on fine-tuning encoder-decoder models on three datasets. Experiments show that end-to-end AVG approach, which is computationally efficient, outperforms other strategies. However, there are differences depending on model sizes and the underlying language model. The code to reproduce all experiments is available at: https://github.com/kassemsabeh/pavi-avg

An Empirical Comparison of Generative Approaches for Product Attribute-Value Identification

TL;DR

The paper tackles the open-world task of Product Attribute and Value Identification (PAVI) by framing it as a generation problem and comparing three AVG strategies—Pipeline AVG, Multitask AVG, and End2End AVG—built on fine-tuned encoder–decoder models such as T5 and BART. It conducts a comprehensive evaluation on three real-world datasets (AE-110K, OA-Mine, MAVE), finding that End2End AVG generally yields the strongest F1 scores, with model size and dataset affecting relative performance; ensemble methods further boost recall. The study highlights cross-dataset generalization challenges due to differing attribute vocabularies and domains, and provides detailed cost analyses and practical guidance for deployment. By releasing code and data splits, it offers a solid benchmark for generation-based PAVI and informs design choices for production systems in search and recommendations.

Abstract

Product attributes are crucial for e-commerce platforms, supporting applications like search, recommendation, and question answering. The task of Product Attribute and Value Identification (PAVI) involves identifying both attributes and their values from product information. In this paper, we formulate PAVI as a generation task and provide, to the best of our knowledge, the most comprehensive evaluation of PAVI so far. We compare three different attribute-value generation (AVG) strategies based on fine-tuning encoder-decoder models on three datasets. Experiments show that end-to-end AVG approach, which is computationally efficient, outperforms other strategies. However, there are differences depending on model sizes and the underlying language model. The code to reproduce all experiments is available at: https://github.com/kassemsabeh/pavi-avg
Paper Structure (13 sections, 6 equations, 3 figures, 6 tables)

This paper contains 13 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An example of a product title with tagged attribute-value pairs.
  • Figure 2: Overview of the proposed AVG approaches.
  • Figure 3: Examples of cross-domain attribute-value identification. Correct predictions are highlighted in green, and wrong ones are highlighted in red. In the first example, the T5 model trained on OA-Mine incorrectly predicts food-related attributes, showing domain bias. While the in-domain T5 model, trained on MAVE dataset, correctly identifies all attribute-value pairs. In the second example, both T5 models trained on MAVE and AE-110K (cross-domain), fail to identify the Flavor attribute.