Table of Contents
Fetching ...

Improving Sequential Recommenders through Counterfactual Augmentation of System Exposure

Ziqi Zhao, Zhaochun Ren, Jiyuan Yang, Zuming Yan, Zihan Wang, Liu Yang, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Xin Xin

TL;DR

CaseRec addresses exposure bias in sequential recommenders by modeling full system exposure with an offline reinforcement learning framework. It leverages a Decision Transformer to learn reward-conditioned exposure sequences and introduces two counterfactual augmentation strategies (Random and Self-Improving) plus a transformer-based user simulator to predict feedback for augmented items. The approach, validated on three real-world datasets, outperforms strong baselines and reduces exposure bias while enhancing diversity; code is provided for reproducibility. This work advances SR by converting exposure data into a learnable, reward-driven sequence, enabling more accurate and robust recommendations in realistic exposure settings.

Abstract

In sequential recommendation (SR), system exposure refers to items that are exposed to the user. Typically, only a few of the exposed items would be interacted with by the user. Although SR has achieved great success in predicting future user interests, existing SR methods still fail to fully exploit system exposure data. Most methods only model items that have been interacted with, while the large volume of exposed but non-interacted items is overlooked. Even methods that consider the whole system exposure typically train the recommender using only the logged historical system exposure, without exploring unseen user interests. In this paper, we propose counterfactual augmentation over system exposure for sequential recommendation (CaseRec). To better model historical system exposure, CaseRec introduces reinforcement learning to account for different exposure rewards. CaseRec uses a decision transformer-based sequential model to take an exposure sequence as input and assigns different rewards according to the user feedback. To further explore unseen user interests, CaseRec proposes to perform counterfactual augmentation, where exposed original items are replaced with counterfactual items. Then, a transformer-based user simulator is proposed to predict the user feedback reward for the augmented items. Augmentation, together with the user simulator, constructs counterfactual exposure sequences to uncover new user interests. Finally, CaseRec jointly uses the logged exposure sequences with the counterfactual exposure sequences to train a decision transformer-based sequential model for generating recommendation. Experiments on three real-world benchmarks show the effectiveness of CaseRec. Our code is available at https://github.com/ZiqiZhao1/CaseRec.

Improving Sequential Recommenders through Counterfactual Augmentation of System Exposure

TL;DR

CaseRec addresses exposure bias in sequential recommenders by modeling full system exposure with an offline reinforcement learning framework. It leverages a Decision Transformer to learn reward-conditioned exposure sequences and introduces two counterfactual augmentation strategies (Random and Self-Improving) plus a transformer-based user simulator to predict feedback for augmented items. The approach, validated on three real-world datasets, outperforms strong baselines and reduces exposure bias while enhancing diversity; code is provided for reproducibility. This work advances SR by converting exposure data into a learnable, reward-driven sequence, enabling more accurate and robust recommendations in realistic exposure settings.

Abstract

In sequential recommendation (SR), system exposure refers to items that are exposed to the user. Typically, only a few of the exposed items would be interacted with by the user. Although SR has achieved great success in predicting future user interests, existing SR methods still fail to fully exploit system exposure data. Most methods only model items that have been interacted with, while the large volume of exposed but non-interacted items is overlooked. Even methods that consider the whole system exposure typically train the recommender using only the logged historical system exposure, without exploring unseen user interests. In this paper, we propose counterfactual augmentation over system exposure for sequential recommendation (CaseRec). To better model historical system exposure, CaseRec introduces reinforcement learning to account for different exposure rewards. CaseRec uses a decision transformer-based sequential model to take an exposure sequence as input and assigns different rewards according to the user feedback. To further explore unseen user interests, CaseRec proposes to perform counterfactual augmentation, where exposed original items are replaced with counterfactual items. Then, a transformer-based user simulator is proposed to predict the user feedback reward for the augmented items. Augmentation, together with the user simulator, constructs counterfactual exposure sequences to uncover new user interests. Finally, CaseRec jointly uses the logged exposure sequences with the counterfactual exposure sequences to train a decision transformer-based sequential model for generating recommendation. Experiments on three real-world benchmarks show the effectiveness of CaseRec. Our code is available at https://github.com/ZiqiZhao1/CaseRec.

Paper Structure

This paper contains 27 sections, 12 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) Illustration of system exposure. Orange blocks refer to items interacted with by the user, while blue blocks refer to items exposed by the system but not interacted with by the user. The shadow $v_2$ denotes that $v_2$ could also interest the user but is not interacted with due to reasons like time limits. (b) System exposure is much larger than interacted data.
  • Figure 2: Overview of CaseRec. (a) The architecture of DT-based sequential recommender, which takes system exposure as input and generates recommendation. Details can be found in Section \ref{['sequtial_recommender']}. (b) The architecture of transformer-based user simulator, which predicts user feedback for a given item. Details can be found in Section \ref{['usersimulator']}. (c) The counterfactual augmentation process, which illustrates two strategies: strategy Random and strategy Self-Improving. Details can be found in Section \ref{['data_augmentation']}. The user simulator (b) is utilized to imitate user feedback for counterfactual items generated by the data augmentation (c), and the sequential recommender (a) is trained upon both logged exposure sequences and augmented counterfactual sequences.
  • Figure 3: Evaluation on recommendation diversity.
  • Figure 4: Impact of the augmentation ratio $\delta$. Grey dashed lines denote the best baseline.
  • Figure 5: Ablation Study.