A Topology-Aware Positive Sample Set Construction and Feature Optimization Method in Implicit Collaborative Filtering
Jiayi Wu, Zhengyu Wu, Xunkai Li, Rong-Hua Li, Guoren Wang
TL;DR
This work tackles false negatives in implicit collaborative filtering by turning unexposed items into constructive supervision signals. It introduces a topology-aware framework (TPSC-FO) with two core components: topology-aware positive sample set construction (TPSC) that uses consensus differential community detection to identify likely false negatives, and a neighborhood-guided feature optimization (FO) that denoises positive sample embeddings via neighborhood mixture. The method is model-agnostic and demonstrates state-of-the-art gains across five real-world and two synthetic datasets, with robust performance across different community detectors and embedding solvers, and it can enhance existing negative sampling strategies. Practically, TPSC-FO provides a principled way to leverage latent false negatives to improve user preference learning while controlling noise in positive supervisory signals, offering a scalable path to stronger implicit CF systems.
Abstract
Negative sampling strategies are widely used in implicit collaborative filtering to address issues like data sparsity and class imbalance. However, these methods often introduce false negatives, hindering the model's ability to accurately learn users' latent preferences. To mitigate this problem, existing methods adjust the negative sampling distribution based on statistical features from model training or the hardness of negative samples. Nevertheless, these methods face two key limitations: (1) over-reliance on the model's current representation capabilities; (2) failure to leverage the potential of false negatives as latent positive samples to guide model learning of user preferences more accurately. To address the above issues, we propose a Topology-aware Positive Sample Set Construction and Feature Optimization method (TPSC-FO). First, we design a simple topological community-aware false negative identification (FNI) method and observe that topological community structures in interaction networks can effectively identify false negatives. Motivated by this, we develop a topology-aware positive sample set construction module. This module employs a differential community detection strategy to capture topological community structures in implicit feedback, coupled with personalized noise filtration to reliably identify false negatives and convert them into positive samples. Additionally, we introduce a neighborhood-guided feature optimization module that refines positive sample features by incorporating neighborhood features in the embedding space, effectively mitigating noise in the positive samples. Extensive experiments on five real-world datasets and two synthetic datasets validate the effectiveness of TPSC-FO.
