Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

Aleksandr Petrov; Craig Macdonald

Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

Aleksandr Petrov, Craig Macdonald

TL;DR

The GPTRec model, which uses a different Next-K strategy, is proposed as an alternative to the Top-K models, and its Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.

Abstract

Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve state-of-the-art performance in the sequential recommendation task according to accuracy-based metrics, such as NDCG. These models treat items as tokens and then utilise a score-and-rank approach (Top-K strategy), where the model first computes item scores and then ranks them according to this score. While this approach works well for accuracy-based metrics, it is hard to use it for optimising more complex beyond-accuracy metrics such as diversity. Recently, the GPTRec model, which uses a different Next-K strategy, has been proposed as an alternative to the Top-K models. In contrast with traditional Top-K recommendations, Next-K generates recommendations item-by-item and, therefore, can account for complex item-to-item interdependencies important for the beyond-accuracy measures. However, the original GPTRec paper focused only on accuracy in experiments and needed to address how to optimise the model for complex beyond-accuracy metrics. Indeed, training GPTRec for beyond-accuracy goals is challenging because the interaction training data available for training recommender systems typically needs to be aligned with beyond-accuracy recommendation goals. To solve the misalignment problem, we train GPTRec using a 2-stage approach: in the first stage, we use a teacher-student approach to train GPTRec, mimicking the behaviour of traditional Top-K models; in the second stage, we use Reinforcement Learning to align the model for beyond-accuracy goals. In particular, we experiment with increasing recommendation diversity and reducing popularity bias. Our experiments on two datasets show that in 3 out of 4 cases, GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.

Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

TL;DR

Abstract

Paper Structure (29 sections, 13 equations, 6 figures, 3 tables)

This paper contains 29 sections, 13 equations, 6 figures, 3 tables.

Introduction
Related work
The Misalignment Problem and Reinforce- ment Learning with Human Feedback
Transformers for sequential recommendation
Training GPTRec for beyond-accuracy goals
Training Objectives
Pre-Train/Fine Tune approach
Efficient Asynchronous Decomposition of Reinforcement Fine-Tuning
Experimental Setup
Implementation
Datasets
Data Splitting
Effectiveness Metrics and Optimisation Goals
Baselines
Teacher Models for Supervised Learning
...and 14 more sections

Figures (6)

Figure 1: GPTRec's Pre-Training/Fine-Tuning scheme. Pre-training (Step 1) takes the form of using a Top-K model like BERT4Rec (teacher) to pre-train a GPTRec model checkpoint (student). In Step 2 (Fine-tuning), the Policy model $\pi$ is the GTPRec model itself initialised by the student model checkpoint from Stage 1; the Value model is a Transformer Decoder-based model with a regression head. The Transformer Decoder layer of the Value model is initialised from the Transformer Decoder layer of the student model, and the regression head is initialised randomly.
Figure 2: GPTRec fine-tuning processes diagram. Green boxes are processes; purple boxes are data.
Figure 3: Models' NDCG@K with Top-K and Next-K recommendation strategies when varying ranking cutoff $K$. Red arrows demonstrate the effectiveness gap between the Top-K and the Next-K inference strategies at a given ranking cutoff $K$.
Figure 4: Accuracy (NDCG@10) / Diversity(ILD@10) tradeoff. Arrows represent the direction of metric improvement, and horizontal and vertical lines represent standard errors.
Figure 5: Accuracy (NDCG@10) / Popularity Bias (nPCOUNT@10) tradeoff. Arrows represent the direction of metric improvement, and horizontal and vertical lines represent standard errors.
...and 1 more figures

Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

TL;DR

Abstract

Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)