Table of Contents
Fetching ...

A Topology-Aware Positive Sample Set Construction and Feature Optimization Method in Implicit Collaborative Filtering

Jiayi Wu, Zhengyu Wu, Xunkai Li, Rong-Hua Li, Guoren Wang

TL;DR

This work tackles false negatives in implicit collaborative filtering by turning unexposed items into constructive supervision signals. It introduces a topology-aware framework (TPSC-FO) with two core components: topology-aware positive sample set construction (TPSC) that uses consensus differential community detection to identify likely false negatives, and a neighborhood-guided feature optimization (FO) that denoises positive sample embeddings via neighborhood mixture. The method is model-agnostic and demonstrates state-of-the-art gains across five real-world and two synthetic datasets, with robust performance across different community detectors and embedding solvers, and it can enhance existing negative sampling strategies. Practically, TPSC-FO provides a principled way to leverage latent false negatives to improve user preference learning while controlling noise in positive supervisory signals, offering a scalable path to stronger implicit CF systems.

Abstract

Negative sampling strategies are widely used in implicit collaborative filtering to address issues like data sparsity and class imbalance. However, these methods often introduce false negatives, hindering the model's ability to accurately learn users' latent preferences. To mitigate this problem, existing methods adjust the negative sampling distribution based on statistical features from model training or the hardness of negative samples. Nevertheless, these methods face two key limitations: (1) over-reliance on the model's current representation capabilities; (2) failure to leverage the potential of false negatives as latent positive samples to guide model learning of user preferences more accurately. To address the above issues, we propose a Topology-aware Positive Sample Set Construction and Feature Optimization method (TPSC-FO). First, we design a simple topological community-aware false negative identification (FNI) method and observe that topological community structures in interaction networks can effectively identify false negatives. Motivated by this, we develop a topology-aware positive sample set construction module. This module employs a differential community detection strategy to capture topological community structures in implicit feedback, coupled with personalized noise filtration to reliably identify false negatives and convert them into positive samples. Additionally, we introduce a neighborhood-guided feature optimization module that refines positive sample features by incorporating neighborhood features in the embedding space, effectively mitigating noise in the positive samples. Extensive experiments on five real-world datasets and two synthetic datasets validate the effectiveness of TPSC-FO.

A Topology-Aware Positive Sample Set Construction and Feature Optimization Method in Implicit Collaborative Filtering

TL;DR

This work tackles false negatives in implicit collaborative filtering by turning unexposed items into constructive supervision signals. It introduces a topology-aware framework (TPSC-FO) with two core components: topology-aware positive sample set construction (TPSC) that uses consensus differential community detection to identify likely false negatives, and a neighborhood-guided feature optimization (FO) that denoises positive sample embeddings via neighborhood mixture. The method is model-agnostic and demonstrates state-of-the-art gains across five real-world and two synthetic datasets, with robust performance across different community detectors and embedding solvers, and it can enhance existing negative sampling strategies. Practically, TPSC-FO provides a principled way to leverage latent false negatives to improve user preference learning while controlling noise in positive supervisory signals, offering a scalable path to stronger implicit CF systems.

Abstract

Negative sampling strategies are widely used in implicit collaborative filtering to address issues like data sparsity and class imbalance. However, these methods often introduce false negatives, hindering the model's ability to accurately learn users' latent preferences. To mitigate this problem, existing methods adjust the negative sampling distribution based on statistical features from model training or the hardness of negative samples. Nevertheless, these methods face two key limitations: (1) over-reliance on the model's current representation capabilities; (2) failure to leverage the potential of false negatives as latent positive samples to guide model learning of user preferences more accurately. To address the above issues, we propose a Topology-aware Positive Sample Set Construction and Feature Optimization method (TPSC-FO). First, we design a simple topological community-aware false negative identification (FNI) method and observe that topological community structures in interaction networks can effectively identify false negatives. Motivated by this, we develop a topology-aware positive sample set construction module. This module employs a differential community detection strategy to capture topological community structures in implicit feedback, coupled with personalized noise filtration to reliably identify false negatives and convert them into positive samples. Additionally, we introduce a neighborhood-guided feature optimization module that refines positive sample features by incorporating neighborhood features in the embedding space, effectively mitigating noise in the positive samples. Extensive experiments on five real-world datasets and two synthetic datasets validate the effectiveness of TPSC-FO.
Paper Structure (17 sections, 9 equations, 9 figures, 6 tables, 3 algorithms)

This paper contains 17 sections, 9 equations, 9 figures, 6 tables, 3 algorithms.

Figures (9)

  • Figure 1: Impact of positive and negative sample similarity.
  • Figure 2: The performance of ComFNI on Amazon-beauty and Epinions.
  • Figure 3: Illustration comparison: (a) Negative sampling heavily relies on the recommendation model’s representation quality. By selecting negative samples from uninteracted items, it fails to utilize the potential of false negatives. (b) Drop- and reweight-based denoising results in sparse positive supervisory signals. (c) TPSC-FO first employs the model-agnostic TPSC module to identify and transform false negatives into positive samples, then uses FO module to denoise positive samples, avoiding the issue of sparse positive supervisory signals.
  • Figure 4: Impact of positive and negative sample similarity.
  • Figure 5: Recall@20 vs. wall-clock time (in seconds).
  • ...and 4 more figures