Disentangling ID and Modality Effects for Session-based Recommendation

Xiaokun Zhang; Bo Xu; Zhaochun Ren; Xiaochen Wang; Hongfei Lin; Fenglong Ma

Disentangling ID and Modality Effects for Session-based Recommendation

Xiaokun Zhang, Bo Xu, Zhaochun Ren, Xiaochen Wang, Hongfei Lin, Fenglong Ma

TL;DR

Disentangling ID and Modality Effects for Session-based Recommendation introduces DIMO, a framework that separates co-occurrence signals from item modality cues to improve both accuracy and explainability in session-based recommendation. It learns ID representations via a global co-occurrence graph, aligns textual and visual modalities into a unified semantic space, and uses multi-view self-supervised learning (proxy mechanism and counterfactual inference) to disentangle the two causes at the session level, with predictions driven by causal inference. Explanations are generated through co-occurrence and feature templates, enabling user-centric rationales grounded in the disentangled causes. Empirical results on four real-world datasets show consistent gains over state-of-the-art baselines and demonstrate that the explanations are meaningful and aligned with the underlying causes, highlighting both practical impact and interpretability gains in SBR systems.

Abstract

Session-based recommendation aims to predict intents of anonymous users based on their limited behaviors. Modeling user behaviors involves two distinct rationales: co-occurrence patterns reflected by item IDs, and fine-grained preferences represented by item modalities (e.g., text and images). However, existing methods typically entangle these causes, leading to their failure in achieving accurate and explainable recommendations. To this end, we propose a novel framework DIMO to disentangle the effects of ID and modality in the task. At the item level, we introduce a co-occurrence representation schema to explicitly incorporate cooccurrence patterns into ID representations. Simultaneously, DIMO aligns different modalities into a unified semantic space to represent them uniformly. At the session level, we present a multi-view self-supervised disentanglement, including proxy mechanism and counterfactual inference, to disentangle ID and modality effects without supervised signals. Leveraging these disentangled causes, DIMO provides recommendations via causal inference and further creates two templates for generating explanations. Extensive experiments on multiple real-world datasets demonstrate the consistent superiority of DIMO over existing methods. Further analysis also confirms DIMO's effectiveness in generating explanations.

Disentangling ID and Modality Effects for Session-based Recommendation

TL;DR

Abstract

Paper Structure (34 sections, 12 equations, 7 figures, 3 tables)

This paper contains 34 sections, 12 equations, 7 figures, 3 tables.

Introduction
Related Work
Session-based Recommendation
Explainable Recommendation
Disentanglement in Recommendation
Preliminaries
Problem Formulation
Global Co-occurrence Graph Construction
The Proposed DIMO
ID and Modality Representation Learning
Co-occurrence Representation Schema
Modality Alignment
Sequence Encoding
Multi-view Self-supervised Disentanglement
Proxy Mechanism
...and 19 more sections

Figures (7)

Figure 1: Two distinct rationales for modeling user behaviors: co-occurrence patterns of ID; fine-grained preferences of modality.
Figure 2: Global co-occurrence graph construction.
Figure 3: The architecture of DIMO. ID and modality representation learning explicitly incorporates co-occurrence patterns into ID embeddings while conducting modality alignment for unified modality representation. Multi-view self-supervised disentanglement distinguishes ID and modality effects via proxy mechanism and counterfactual inference. Based on disentangled causes, DIMO provides recommendation via causal inference and generates explanations on two kinds of templates.
Figure 4: Effect of multi-view self-supervised disentanglement.
Figure 5: Case study for explainable session-based recommendation.
...and 2 more figures

Disentangling ID and Modality Effects for Session-based Recommendation

TL;DR

Abstract

Disentangling ID and Modality Effects for Session-based Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)