Table of Contents
Fetching ...

ITEM: Improving Training and Evaluation of Message-Passing based GNNs for top-k recommendation

Yannis Karmim, Elias Ramzi, Raphaël Fournier-S'niehotta, Nicolas Thome

TL;DR

The paper addresses the misalignment between training objectives and ranking-based evaluation in MP-GNNs for top-$k$ recommendation. It introduces ITEM, a framework that directly optimizes a differentiable, ranking-based loss via a smooth rank proxy and a Personalized PageRank-based offline negative sampling strategy, coupled with an inductive user-centric evaluation protocol. Empirical results across four datasets and multiple GNN architectures show ITEM surpasses standard BPR training and several advanced losses, with faster convergence and stronger inductive generalization. This work demonstrates the practicality and effectiveness of ranking-loss optimization for graph-based collaborative filtering in realistic, user-centric scenarios.

Abstract

Graph Neural Networks (GNNs), especially message-passing-based models, have become prominent in top-k recommendation tasks, outperforming matrix factorization models due to their ability to efficiently aggregate information from a broader context. Although GNNs are evaluated with ranking-based metrics, e.g NDCG@k and Recall@k, they remain largely trained with proxy losses, e.g the BPR loss. In this work we explore the use of ranking loss functions to directly optimize the evaluation metrics, an area not extensively investigated in the GNN community for collaborative filtering. We take advantage of smooth approximations of the rank to facilitate end-to-end training of GNNs and propose a Personalized PageRank-based negative sampling strategy tailored for ranking loss functions. Moreover, we extend the evaluation of GNN models for top-k recommendation tasks with an inductive user-centric protocol, providing a more accurate reflection of real-world applications. Our proposed method significantly outperforms the standard BPR loss and more advanced losses across four datasets and four recent GNN architectures while also exhibiting faster training. Demonstrating the potential of ranking loss functions in improving GNN training for collaborative filtering tasks.

ITEM: Improving Training and Evaluation of Message-Passing based GNNs for top-k recommendation

TL;DR

The paper addresses the misalignment between training objectives and ranking-based evaluation in MP-GNNs for top- recommendation. It introduces ITEM, a framework that directly optimizes a differentiable, ranking-based loss via a smooth rank proxy and a Personalized PageRank-based offline negative sampling strategy, coupled with an inductive user-centric evaluation protocol. Empirical results across four datasets and multiple GNN architectures show ITEM surpasses standard BPR training and several advanced losses, with faster convergence and stronger inductive generalization. This work demonstrates the practicality and effectiveness of ranking-loss optimization for graph-based collaborative filtering in realistic, user-centric scenarios.

Abstract

Graph Neural Networks (GNNs), especially message-passing-based models, have become prominent in top-k recommendation tasks, outperforming matrix factorization models due to their ability to efficiently aggregate information from a broader context. Although GNNs are evaluated with ranking-based metrics, e.g NDCG@k and Recall@k, they remain largely trained with proxy losses, e.g the BPR loss. In this work we explore the use of ranking loss functions to directly optimize the evaluation metrics, an area not extensively investigated in the GNN community for collaborative filtering. We take advantage of smooth approximations of the rank to facilitate end-to-end training of GNNs and propose a Personalized PageRank-based negative sampling strategy tailored for ranking loss functions. Moreover, we extend the evaluation of GNN models for top-k recommendation tasks with an inductive user-centric protocol, providing a more accurate reflection of real-world applications. Our proposed method significantly outperforms the standard BPR loss and more advanced losses across four datasets and four recent GNN architectures while also exhibiting faster training. Demonstrating the potential of ranking loss functions in improving GNN training for collaborative filtering tasks.
Paper Structure (43 sections, 7 equations, 7 figures, 10 tables)

This paper contains 43 sections, 7 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: The goal in top-$k$ recommendation is to recommend to a user, e.g.$u_q$ (purple), relevant items such as $i_{p_4}$ (in green), based on its interaction history, i.e. items in blue such as $i_{p_1}$. ITEM directly optimizes the evaluation metric, i.e.NDCG, during training using a smooth approximation of the rank and Personalized PageRank (Page1998TheWeb) based negative sampling. Best seen in color.
  • Figure 2: Using message passing, a GNN creates embeddings for every node of the graph. For each user we first construct a batch of randomly sampled positive items and negative items selected with our Personalized PageRank (Page1998TheWeb) based negative sampling \ref{['eq:pprneg']}. We then compute the score of the user wrt. the batch of items and calculate the loss using the approximation of the rank of \ref{['eq:def_approx_rank']}. Finally the loss is backpropagated to update the parameters of the GNN, and update the embeddings for the items and users in the transductive setting. Best seen in color.
  • Figure 3: Transductive interaction-split (left) vs Inductive user-split (right) (Meng2020ExploringModels). In the first case, the same users are in train and test, their learned embeddings can be directly used in test. In the second case, a part of the users and 100% of their interactions are used in train. During the evaluation, the model infers a representation of a new test user from some interactions (fold-in), in order to predict the fold-out items where we apply the ranking metrics.
  • Figure 4: $\tau$ in \ref{['eq:def_approx_rank']} vs R@20, NDCG@20 on MovieLens-100k (inductive) with LightGCN (He2020LightGCN:Recommendation).
  • Figure 5: Qualitative results on MovieLens-100k. We compare the ranking obtained using LightGCN trained with the baseline BPR loss (bottom row) to the ranking obtained using ITEM (top row). Positive elements are highlighted in green, while negative elements are indicated in red.
  • ...and 2 more figures