An Empirical Comparison of Generative Approaches for Product Attribute-Value Identification
Kassem Sabeh, Robert Litschko, Mouna Kacimi, Barbara Plank, Johann Gamper
TL;DR
The paper tackles the open-world task of Product Attribute and Value Identification (PAVI) by framing it as a generation problem and comparing three AVG strategies—Pipeline AVG, Multitask AVG, and End2End AVG—built on fine-tuned encoder–decoder models such as T5 and BART. It conducts a comprehensive evaluation on three real-world datasets (AE-110K, OA-Mine, MAVE), finding that End2End AVG generally yields the strongest F1 scores, with model size and dataset affecting relative performance; ensemble methods further boost recall. The study highlights cross-dataset generalization challenges due to differing attribute vocabularies and domains, and provides detailed cost analyses and practical guidance for deployment. By releasing code and data splits, it offers a solid benchmark for generation-based PAVI and informs design choices for production systems in search and recommendations.
Abstract
Product attributes are crucial for e-commerce platforms, supporting applications like search, recommendation, and question answering. The task of Product Attribute and Value Identification (PAVI) involves identifying both attributes and their values from product information. In this paper, we formulate PAVI as a generation task and provide, to the best of our knowledge, the most comprehensive evaluation of PAVI so far. We compare three different attribute-value generation (AVG) strategies based on fine-tuning encoder-decoder models on three datasets. Experiments show that end-to-end AVG approach, which is computationally efficient, outperforms other strategies. However, there are differences depending on model sizes and the underlying language model. The code to reproduce all experiments is available at: https://github.com/kassemsabeh/pavi-avg
