Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?

Ming Li; Yuanna Liu; Sami Jullien; Mozhdeh Ariannezhad; Mohammad Aliannejadi; Andrew Yates; Maarten de Rijke

Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?

Ming Li, Yuanna Liu, Sami Jullien, Mozhdeh Ariannezhad, Mohammad Aliannejadi, Andrew Yates, Maarten de Rijke

TL;DR

The paper tackles the challenge of balancing accuracy with beyond-accuracy metrics in next basket recommendation (NBR). It introduces TREx, a plug-and-play two-step framework that decouples repeat-item prediction (for accuracy) from explore-item prediction (for fairness and diversity), enabling controlled trade-offs in fixed-size baskets of size $k$. Through experiments on Instacart and Dunnhumby across eight metrics, TREx demonstrates that a “short-cut” strategy—maximizing repetition for accuracy while optimizing explore items for beyond-accuracy—can improve several fairness and diversity metrics without sacrificing overall accuracy, though benefits depend on how strongly a metric correlates with accuracy. The study also scrutinizes evaluation paradigms, arguing for fine-grained, per-subtask assessments to avoid overestimating beyond-accuracy gains and to better guide future NBR research. Overall, the work highlights both the potential of modular decoupling in NBR and the importance of robust, nuanced evaluation in multi-objective recommender design.

Abstract

Next basket recommendation (NBR) is a special type of sequential recommendation that is increasingly receiving attention. So far, most NBR studies have focused on optimizing the accuracy of the recommendation, whereas optimizing for beyond-accuracy metrics, e.g., item fairness and diversity remains largely unexplored. Recent studies into NBR have found a substantial performance difference between recommending repeat items and explore items. Repeat items contribute most of the users' perceived accuracy compared with explore items. Informed by these findings, we identify a potential "short-cut" to optimize for beyond-accuracy metrics while maintaining high accuracy. To leverage and verify the existence of such short-cuts, we propose a plug-and-play two-step repetition-exploration (TREx) framework that treats repeat items and explores items separately, where we design a simple yet highly effective repetition module to ensure high accuracy, while two exploration modules target optimizing only beyond-accuracy metrics. Experiments are performed on two widely-used datasets w.r.t. a range of beyond-accuracy metrics, viz. five fairness metrics and three diversity metrics. Our experimental results verify the effectiveness of TREx. Prima facie, this appears to be good news: we can achieve high accuracy and improved beyond-accuracy metrics at the same time. However, we argue that the real-world value of our algorithmic solution, TREx, is likely to be limited and reflect on the reasonableness of the evaluation setup. We end up challenging existing evaluation paradigms, particularly in the context of beyond-accuracy metrics, and provide insights for researchers to navigate potential pitfalls and determine reasonable metrics to consider when optimizing for accuracy and beyond-accuracy metrics.

Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?

TL;DR

. Through experiments on Instacart and Dunnhumby across eight metrics, TREx demonstrates that a “short-cut” strategy—maximizing repetition for accuracy while optimizing explore items for beyond-accuracy—can improve several fairness and diversity metrics without sacrificing overall accuracy, though benefits depend on how strongly a metric correlates with accuracy. The study also scrutinizes evaluation paradigms, arguing for fine-grained, per-subtask assessments to avoid overestimating beyond-accuracy gains and to better guide future NBR research. Overall, the work highlights both the potential of modular decoupling in NBR and the importance of robust, nuanced evaluation in multi-objective recommender design.

Abstract

Paper Structure (17 sections, 6 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 6 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Task Formulation and Definitions
Evaluation metrics
A Two-Step Repetition-Exploration Framework
Repetition module
Exploration module
Basket generation module
Experiments
Experimental setup
Simple baselines
Nearest neighbor-based methods
Neural network-based methods
Overall accuracy performance
Beyond-accuracy performance
...and 2 more sections

Figures (5)

Figure 1: Distribution of users across different repeat ratios for Instacart and Dunnhumby.
Figure 2: Performance of TREx-Rep when we add a time-decay factor $\beta$ (+T), add both $\beta$ and item-specific repetition feature $RepI(i)$ (+T+RF).
Figure 3: The recall improvement of (+T+RF) over (+T) when the training sample ratio changes from 0.2 to 1.
Figure 4: Performance of $*{TREx}_{Diversity}$ at different $v$ values, compared with different *NBR methods in terms of different diversity metrics. The red $+$ marker indicates the direction with both high accuracy and diversity.
Figure 5: Performance of $*{TREx}_{Fairness}$ at different $v$ values, compared with different *NBR methods in terms of different fairness metrics. The red $+$ marker indicates the direction with both high accuracy and fairness.

Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?

TL;DR

Abstract

Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation?

Authors

TL;DR

Abstract

Table of Contents

Figures (5)