Table of Contents
Fetching ...

LLM-Enhanced Reranking for Complementary Product Recommendation

Zekun Xu, Yudi Zhang

TL;DR

The paper tackles the accuracy-diversity dilemma in complementary product recommendation by introducing a model-agnostic reranking framework that leverages LLM prompting on top of any baseline retriever. It employs two agents—diversity and accuracy—prompted via structured inputs to reorder candidate items without retraining the underlying model. Across four public datasets, the approach yields notable improvements in both accuracy (Hit@K, NDCG@K) and diversity metrics, with the diversity agent driving broader item coverage and the accuracy agent enhancing precision at the cost of some diversity. The work highlights a practical, retraining-free path to more balanced recommendations and suggests future work on iterative, multi-agent collaboration.

Abstract

Complementary product recommendation, which aims to suggest items that are used together to enhance customer value, is a crucial yet challenging task in e-commerce. While existing graph neural network (GNN) approaches have made significant progress in capturing complex product relationships, they often struggle with the accuracy-diversity tradeoff, particularly for long-tail items. This paper introduces a model-agnostic approach that leverages Large Language Models (LLMs) to enhance the reranking of complementary product recommendations. Unlike previous works that use LLMs primarily for data preprocessing and graph augmentation, our method applies LLM-based prompting strategies directly to rerank candidate items retrieved from existing recommendation models, eliminating the need for model retraining. Through extensive experiments on public datasets, we demonstrate that our approach effectively balances accuracy and diversity in complementary product recommendations, with at least 50% lift in accuracy metrics and 2% lift in diversity metrics on average for the top recommended items across datasets.

LLM-Enhanced Reranking for Complementary Product Recommendation

TL;DR

The paper tackles the accuracy-diversity dilemma in complementary product recommendation by introducing a model-agnostic reranking framework that leverages LLM prompting on top of any baseline retriever. It employs two agents—diversity and accuracy—prompted via structured inputs to reorder candidate items without retraining the underlying model. Across four public datasets, the approach yields notable improvements in both accuracy (Hit@K, NDCG@K) and diversity metrics, with the diversity agent driving broader item coverage and the accuracy agent enhancing precision at the cost of some diversity. The work highlights a practical, retraining-free path to more balanced recommendations and suggests future work on iterative, multi-agent collaboration.

Abstract

Complementary product recommendation, which aims to suggest items that are used together to enhance customer value, is a crucial yet challenging task in e-commerce. While existing graph neural network (GNN) approaches have made significant progress in capturing complex product relationships, they often struggle with the accuracy-diversity tradeoff, particularly for long-tail items. This paper introduces a model-agnostic approach that leverages Large Language Models (LLMs) to enhance the reranking of complementary product recommendations. Unlike previous works that use LLMs primarily for data preprocessing and graph augmentation, our method applies LLM-based prompting strategies directly to rerank candidate items retrieved from existing recommendation models, eliminating the need for model retraining. Through extensive experiments on public datasets, we demonstrate that our approach effectively balances accuracy and diversity in complementary product recommendations, with at least 50% lift in accuracy metrics and 2% lift in diversity metrics on average for the top recommended items across datasets.

Paper Structure

This paper contains 17 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Lift in accuracy (Column 1: Hit; Column 2: NDCG) and diversity (Column 3: Entropy; Column 4: Vocabulary Size) metrics by dataset (Electronics, Cell Phones, Grocery, Home), where the standard error bar represents variability across three baseline GNNs (GraphSage, GAT, SComGNN). Row 1: overall enhancement with both diversity and accuracy agents vs. baseline; Row 2: ablation enhancement with diversity agent vs. baseline; Row 3: ablation enhancement with both diversity and accuracy agents vs. diversity agent only. The underlying LLM is Llama3.3-70B. Hyperparameter is 50 for diversity agent and 25 for accuracy agent.
  • Figure 2: Lift in accuracy (Column 1: Hit; Column 2: NDCG) and diversity (Column 3: Entropy; Column 4: Vocabulary Size) metrics by dataset (Electronics, Cell Phones, Grocery, Home), where the standard error bar represents variability across three baseline GNNs (GraphSage, GAT, SComGNN). Row 1: overall enhancement with both diversity and accuracy agents vs. baseline; Row 2: ablation enhancement with diversity agent vs. baseline; Row 3: ablation enhancement with both diversity and accuracy agents vs. diversity agent only. The underlying LLM is Llama3.3-70B. Hyperparameter is 100 for diversity agent and 50 for accuracy agent.