Table of Contents
Fetching ...

Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity

Yu Hou, Jin-Duk Park, Won-Yong Shin

TL;DR

CF-Diff introduces a diffusion-model-based collaborative filtering framework that explicitly exploits high-order connectivity through CAM-AE. By pairing a forward-diffusion process on user-item interactions with a reverse-denoising model that integrates multi-hop neighbor information via a cross-attention mechanism and an attention-aided autoencoder, the approach achieves strong recommendation performance while maintaining scalability. The authors provide theoretical results showing that CAM-AE embeddings approximate full cross-attention with linear training complexity in the larger axis, and they validate the method on three real-world datasets, achieving up to 7.29% gains in NDCG@10 over baselines. Overall, the work demonstrates that diffusion models can effectively capture complex collaborative signals in CF and scale to large systems, offering a practical alternative to traditional GNN-based CF methods.

Abstract

A recent study has shown that diffusion models are well-suited for modeling the generative process of user-item interactions in recommender systems due to their denoising nature. However, existing diffusion model-based recommender systems do not explicitly leverage high-order connectivities that contain crucial collaborative signals for accurate recommendations. Addressing this gap, we propose CF-Diff, a new diffusion model-based collaborative filtering (CF) method, which is capable of making full use of collaborative signals along with multi-hop neighbors. Specifically, the forward-diffusion process adds random noise to user-item interactions, while the reverse-denoising process accommodates our own learning model, named cross-attention-guided multi-hop autoencoder (CAM-AE), to gradually recover the original user-item interactions. CAM-AE consists of two core modules: 1) the attention-aided AE module, responsible for precisely learning latent representations of user-item interactions while preserving the model's complexity at manageable levels, and 2) the multi-hop cross-attention module, which judiciously harnesses high-order connectivity information to capture enhanced collaborative signals. Through comprehensive experiments on three real-world datasets, we demonstrate that CF-Diff is (a) Superior: outperforming benchmark recommendation methods, achieving remarkable gains up to 7.29% compared to the best competitor, (b) Theoretically-validated: reducing computations while ensuring that the embeddings generated by our model closely approximate those from the original cross-attention, and (c) Scalable: proving the computational efficiency that scales linearly with the number of users or items.

Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity

TL;DR

CF-Diff introduces a diffusion-model-based collaborative filtering framework that explicitly exploits high-order connectivity through CAM-AE. By pairing a forward-diffusion process on user-item interactions with a reverse-denoising model that integrates multi-hop neighbor information via a cross-attention mechanism and an attention-aided autoencoder, the approach achieves strong recommendation performance while maintaining scalability. The authors provide theoretical results showing that CAM-AE embeddings approximate full cross-attention with linear training complexity in the larger axis, and they validate the method on three real-world datasets, achieving up to 7.29% gains in NDCG@10 over baselines. Overall, the work demonstrates that diffusion models can effectively capture complex collaborative signals in CF and scale to large systems, offering a practical alternative to traditional GNN-based CF methods.

Abstract

A recent study has shown that diffusion models are well-suited for modeling the generative process of user-item interactions in recommender systems due to their denoising nature. However, existing diffusion model-based recommender systems do not explicitly leverage high-order connectivities that contain crucial collaborative signals for accurate recommendations. Addressing this gap, we propose CF-Diff, a new diffusion model-based collaborative filtering (CF) method, which is capable of making full use of collaborative signals along with multi-hop neighbors. Specifically, the forward-diffusion process adds random noise to user-item interactions, while the reverse-denoising process accommodates our own learning model, named cross-attention-guided multi-hop autoencoder (CAM-AE), to gradually recover the original user-item interactions. CAM-AE consists of two core modules: 1) the attention-aided AE module, responsible for precisely learning latent representations of user-item interactions while preserving the model's complexity at manageable levels, and 2) the multi-hop cross-attention module, which judiciously harnesses high-order connectivity information to capture enhanced collaborative signals. Through comprehensive experiments on three real-world datasets, we demonstrate that CF-Diff is (a) Superior: outperforming benchmark recommendation methods, achieving remarkable gains up to 7.29% compared to the best competitor, (b) Theoretically-validated: reducing computations while ensuring that the embeddings generated by our model closely approximate those from the original cross-attention, and (c) Scalable: proving the computational efficiency that scales linearly with the number of users or items.
Paper Structure (24 sections, 2 theorems, 15 equations, 6 figures, 4 tables)

This paper contains 24 sections, 2 theorems, 15 equations, 6 figures, 4 tables.

Key Result

Theorem 1

Suppose that $\max \left\{ {\left| \mathcal{U} \right|,\left| \mathcal{I} \right|} \right\}$ is sufficiently large. If $k \ge {\rm{ }}{{5\!\ln \left( {\max \left\{ {\left| \mathcal{U} \right|,\left| \mathcal{I} \right|} \right\}} \right)} \mathord{\left/ {\newline} \right. \nulldelimiterspace} {\lef where ${\bf Q} \in \mathbb{R}^{\max \left\{ {\left| \mathcal{U} \right|,\left| \mathcal{I} \right|}

Figures (6)

  • Figure 1: Illustration showing (a) neighbors of User 1 and User 3 up to 3 hops and (b) how such high-order connectivity information can be potentially encoded and infused into the diffusion model-based learning system. Here, $\left\{ {{\bf{u}}_0 , \cdots ,{\bf{u}}_T } \right\}$ are the encoded information of direct user--item interactions at each step, and ${\bf u}'$ is the encoded high-order connectivity information.
  • Figure 2: The schematic overview of CF-Diff when both $2$-hop and $3$-hop neighboring nodes are taken into account.
  • Figure 3: Extraction and encoding of 2-hop and 3-hop neighbors of the target user ( User 1) as well as direct neighbors for a given bipartite graph.
  • Figure 4: The effect of hyperparameters $k$ and $d$ on N@K for the ML-1M dataset.
  • Figure 5: The effect of hyperparameters $N$ and $\alpha$ on N@K for the ML-1M dataset.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2