Table of Contents
Fetching ...

Bridging Language and Items for Retrieval and Recommendation

Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley

TL;DR

The paper tackles the challenge of linking natural language with item metadata for retrieval and recommendation by introducing BLaIR, a RoBERTa-based series of sentence-embedding models trained with a language–item contrastive objective on the new Amazon Reviews 2023 dataset. It also introduces a complex product search task and a semi-synthetic Amazon-C4 evaluation to test generalization across domains and contexts. The authors show that BLaIR yields superior text-based item representations across sequential recommendation and product search benchmarks, with strong gains from multi-domain pretraining and scalable model variants. They release the dataset, code, and checkpoints to facilitate future research in language-heavy, cross-domain recommendation systems.

Abstract

This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios. BLaIR is trained to learn correlations between item metadata and potential natural language context, which is useful for retrieving and recommending items. To pretrain BLaIR, we collect Amazon Reviews 2023, a new dataset comprising over 570 million reviews and 48 million items from 33 categories, significantly expanding beyond the scope of previous versions. We evaluate the generalization ability of BLaIR across multiple domains and tasks, including a new task named complex product search, referring to retrieving relevant items given long, complex natural language contexts. Leveraging large language models like ChatGPT, we correspondingly construct a semi-synthetic evaluation set, Amazon-C4. Empirical results on the new task, as well as conventional retrieval and recommendation tasks, demonstrate that BLaIR exhibit strong text and item representation capacity. Our datasets, code, and checkpoints are available at: https://github.com/hyp1231/AmazonReviews2023.

Bridging Language and Items for Retrieval and Recommendation

TL;DR

The paper tackles the challenge of linking natural language with item metadata for retrieval and recommendation by introducing BLaIR, a RoBERTa-based series of sentence-embedding models trained with a language–item contrastive objective on the new Amazon Reviews 2023 dataset. It also introduces a complex product search task and a semi-synthetic Amazon-C4 evaluation to test generalization across domains and contexts. The authors show that BLaIR yields superior text-based item representations across sequential recommendation and product search benchmarks, with strong gains from multi-domain pretraining and scalable model variants. They release the dataset, code, and checkpoints to facilitate future research in language-heavy, cross-domain recommendation systems.

Abstract

This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios. BLaIR is trained to learn correlations between item metadata and potential natural language context, which is useful for retrieving and recommending items. To pretrain BLaIR, we collect Amazon Reviews 2023, a new dataset comprising over 570 million reviews and 48 million items from 33 categories, significantly expanding beyond the scope of previous versions. We evaluate the generalization ability of BLaIR across multiple domains and tasks, including a new task named complex product search, referring to retrieving relevant items given long, complex natural language contexts. Leveraging large language models like ChatGPT, we correspondingly construct a semi-synthetic evaluation set, Amazon-C4. Empirical results on the new task, as well as conventional retrieval and recommendation tasks, demonstrate that BLaIR exhibit strong text and item representation capacity. Our datasets, code, and checkpoints are available at: https://github.com/hyp1231/AmazonReviews2023.
Paper Structure (12 sections, 3 equations, 1 figure, 9 tables)

This paper contains 12 sections, 3 equations, 1 figure, 9 tables.

Figures (1)

  • Figure 1: The overview of BLaIR.