Table of Contents
Fetching ...

LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

Chenhao Fang, Xiaohan Li, Zezhong Fan, Jianpeng Xu, Kaushiki Nag, Evren Korpeoglu, Sushant Kumar, Kannan Achan

TL;DR

This work tackles the problem of robustly extracting product attribute values in e-commerce by leveraging multiple Large Language Models (LLMs). It introduces LLM-ensemble, a crowdsourcing-inspired method that treats each LLM as a worker and uses a Dawid-Skene latent-variable framework to iteratively learn per-LLM weights and perform weighted label aggregation, ensuring efficient convergence and safe deployment. The approach is validated both offline on Walmart-scale data and online via A/B testing, where it outperforms all single LLMs and baselines and delivers measurable business gains (e.g., improvements in GMV, CTR, CVR, and ATC). The results demonstrate that ensemble signals from diverse LLMs can substantially improve product attribute value extraction and enhance e-commerce recommendation quality in production settings, with practical impact on user engagement and sales.

Abstract

Product attribute value extraction is a pivotal component in Natural Language Processing (NLP) and the contemporary e-commerce industry. The provision of precise product attribute values is fundamental in ensuring high-quality recommendations and enhancing customer satisfaction. The recently emerging Large Language Models (LLMs) have demonstrated state-of-the-art performance in numerous attribute extraction tasks, without the need for domain-specific training data. Nevertheless, varying strengths and weaknesses are exhibited by different LLMs due to the diversity in data, architectures, and hyperparameters. This variation makes them complementary to each other, with no single LLM dominating all others. Considering the diverse strengths and weaknesses of LLMs, it becomes necessary to develop an ensemble method that leverages their complementary potentials. In this paper, we propose a novel algorithm called LLM-ensemble to ensemble different LLMs' outputs for attribute value extraction. We iteratively learn the weights for different LLMs to aggregate the labels with weights to predict the final attribute value. Not only can our proposed method be proven theoretically optimal, but it also ensures efficient computation, fast convergence, and safe deployment. We have also conducted extensive experiments with various state-of-the-art LLMs, including Llama2-13B, Llama2-70B, PaLM-2, GPT-3.5, and GPT-4, on Walmart's internal data. Our offline metrics demonstrate that the LLM-ensemble method outperforms all the state-of-the-art single LLMs on Walmart's internal dataset. This method has been launched in several production models, leading to improved Gross Merchandise Volume (GMV), Click-Through Rate (CTR), Conversion Rate (CVR), and Add-to-Cart Rate (ATC).

LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

TL;DR

This work tackles the problem of robustly extracting product attribute values in e-commerce by leveraging multiple Large Language Models (LLMs). It introduces LLM-ensemble, a crowdsourcing-inspired method that treats each LLM as a worker and uses a Dawid-Skene latent-variable framework to iteratively learn per-LLM weights and perform weighted label aggregation, ensuring efficient convergence and safe deployment. The approach is validated both offline on Walmart-scale data and online via A/B testing, where it outperforms all single LLMs and baselines and delivers measurable business gains (e.g., improvements in GMV, CTR, CVR, and ATC). The results demonstrate that ensemble signals from diverse LLMs can substantially improve product attribute value extraction and enhance e-commerce recommendation quality in production settings, with practical impact on user engagement and sales.

Abstract

Product attribute value extraction is a pivotal component in Natural Language Processing (NLP) and the contemporary e-commerce industry. The provision of precise product attribute values is fundamental in ensuring high-quality recommendations and enhancing customer satisfaction. The recently emerging Large Language Models (LLMs) have demonstrated state-of-the-art performance in numerous attribute extraction tasks, without the need for domain-specific training data. Nevertheless, varying strengths and weaknesses are exhibited by different LLMs due to the diversity in data, architectures, and hyperparameters. This variation makes them complementary to each other, with no single LLM dominating all others. Considering the diverse strengths and weaknesses of LLMs, it becomes necessary to develop an ensemble method that leverages their complementary potentials. In this paper, we propose a novel algorithm called LLM-ensemble to ensemble different LLMs' outputs for attribute value extraction. We iteratively learn the weights for different LLMs to aggregate the labels with weights to predict the final attribute value. Not only can our proposed method be proven theoretically optimal, but it also ensures efficient computation, fast convergence, and safe deployment. We have also conducted extensive experiments with various state-of-the-art LLMs, including Llama2-13B, Llama2-70B, PaLM-2, GPT-3.5, and GPT-4, on Walmart's internal data. Our offline metrics demonstrate that the LLM-ensemble method outperforms all the state-of-the-art single LLMs on Walmart's internal dataset. This method has been launched in several production models, leading to improved Gross Merchandise Volume (GMV), Click-Through Rate (CTR), Conversion Rate (CVR), and Add-to-Cart Rate (ATC).
Paper Structure (12 sections, 1 figure, 2 tables, 1 algorithm)

This paper contains 12 sections, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: (a) The input data matrix $W$. We take the attribute "gender" as an example, and its labels are "Male" (M), "Female" (F), and "Unisex" (U). (b) The illustration of LLM-ensemble procedures. To learn the label of a product for attribute $q$, we have $N$ LLMs as inputs to the LLM-Ensemble algorithm. After several rounds of iteration, the algorithm generates the weights for each LLM and aggregates the labels with weights to predict the final label $\hat{y}_q$.

Theorems & Definitions (1)

  • Definition 1: Attribute Value Extraction with LLM ensemble.