Disentangling ID and Modality Effects for Session-based Recommendation
Xiaokun Zhang, Bo Xu, Zhaochun Ren, Xiaochen Wang, Hongfei Lin, Fenglong Ma
TL;DR
Disentangling ID and Modality Effects for Session-based Recommendation introduces DIMO, a framework that separates co-occurrence signals from item modality cues to improve both accuracy and explainability in session-based recommendation. It learns ID representations via a global co-occurrence graph, aligns textual and visual modalities into a unified semantic space, and uses multi-view self-supervised learning (proxy mechanism and counterfactual inference) to disentangle the two causes at the session level, with predictions driven by causal inference. Explanations are generated through co-occurrence and feature templates, enabling user-centric rationales grounded in the disentangled causes. Empirical results on four real-world datasets show consistent gains over state-of-the-art baselines and demonstrate that the explanations are meaningful and aligned with the underlying causes, highlighting both practical impact and interpretability gains in SBR systems.
Abstract
Session-based recommendation aims to predict intents of anonymous users based on their limited behaviors. Modeling user behaviors involves two distinct rationales: co-occurrence patterns reflected by item IDs, and fine-grained preferences represented by item modalities (e.g., text and images). However, existing methods typically entangle these causes, leading to their failure in achieving accurate and explainable recommendations. To this end, we propose a novel framework DIMO to disentangle the effects of ID and modality in the task. At the item level, we introduce a co-occurrence representation schema to explicitly incorporate cooccurrence patterns into ID representations. Simultaneously, DIMO aligns different modalities into a unified semantic space to represent them uniformly. At the session level, we present a multi-view self-supervised disentanglement, including proxy mechanism and counterfactual inference, to disentangle ID and modality effects without supervised signals. Leveraging these disentangled causes, DIMO provides recommendations via causal inference and further creates two templates for generating explanations. Extensive experiments on multiple real-world datasets demonstrate the consistent superiority of DIMO over existing methods. Further analysis also confirms DIMO's effectiveness in generating explanations.
