Mitigating Spurious Correlations for Self-supervised Recommendation
Xinyu Lin, Yiyan Xu, Wenjie Wang, Yang Zhang, Fuli Feng
TL;DR
This work addresses spurious correlations in self-supervised recommendation by introducing Invariant Feature Learning (IFL), which automatically masks spurious features and blocks their negative transmission through mask-guided contrastive learning. By clustering interactions into multiple environments and enforcing invariant features via a gradient-variance regularizer, IFL identifies robust representations while discarding misleading cues. The method optimizes a composite objective $\mathcal{L}=\mathcal{L}_{CF}+\alpha\mathcal{L}_{ssl}+\beta\mathcal{L}_v+\lambda\|\boldsymbol{\theta}\|^2$, and uses mask-driven data augmentation to stabilize SSL against spurious features. Empirical results on Meituan and XING show improved OOD generalization without sacrificing IID performance, demonstrating the practical potential of invariant feature masking in SSL-based recommendations, with code available at the provided GitHub repository.
Abstract
Recent years have witnessed the great success of self-supervised learning (SSL) in recommendation systems. However, SSL recommender models are likely to suffer from spurious correlations, leading to poor generalization. To mitigate spurious correlations, existing work usually pursues ID-based SSL recommendation or utilizes feature engineering to identify spurious features. Nevertheless, ID-based SSL approaches sacrifice the positive impact of invariant features, while feature engineering methods require high-cost human labeling. To address the problems, we aim to automatically mitigate the effect of spurious correlations. This objective requires to 1) automatically mask spurious features without supervision, and 2) block the negative effect transmission from spurious features to other features during SSL. To handle the two challenges, we propose an invariant feature learning framework, which first divides user-item interactions into multiple environments with distribution shifts and then learns a feature mask mechanism to capture invariant features across environments. Based on the mask mechanism, we can remove the spurious features for robust predictions and block the negative effect transmission via mask-guided feature augmentation. Extensive experiments on two datasets demonstrate the effectiveness of the proposed framework in mitigating spurious correlations and improving the generalization abilities of SSL models. The code is available at https://github.com/Linxyhaha/IFL.
