Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models
Tianrui Song, Wen-Shuo Chao, Hao Liu
TL;DR
This work investigates hard-noisy sample confusion in implicit-feedback recommender systems and introduces LLMHNI, which leverages two auxiliary signals from large language models: semantic relevance from LLM-encoded embeddings and logical relevance from LLM-inferred interactions. The framework comprises semantic relevance guided hard negative mining with objective-aligned embeddings and logical relevance guided interaction denoising with cross-graph contrastive alignment and hallucination-robust learning, jointly optimized as $\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{rec}} + \lambda_{1} \mathcal{L}_{\text{de}} + \lambda_{2} \mathcal{L}_{\text{hal}}$. Extensive experiments on three real-world datasets and two backbone recommenders show significant improvements in denoising and recommendations, along with robust performance under increasing noise levels. By addressing objective mismatch and hallucination-induced errors through embedding alignment and cross-graph contrastive strategies, the approach demonstrates strong practical impact for more reliable and effective recommender systems.
Abstract
Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values, and mitigate their influence through sample dropping or reweighting. However, we observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue. Such confusion is problematic as hard samples are vital for modeling user preferences. To solve this problem, we propose LLMHNI framework, leveraging two auxiliary user-item relevance signals generated by Large Language Models (LLMs) to differentiate hard and noisy samples. LLMHNI obtains user-item semantic relevance from LLM-encoded embeddings, which is used in negative sampling to select hard negatives while filtering out noisy false negatives. An objective alignment strategy is proposed to project LLM-encoded embeddings, originally for general language tasks, into a representation space optimized for user-item relevance modeling. LLMHNI also exploits LLM-inferred logical relevance within user-item interactions to identify hard and noisy samples. These LLM-inferred interactions are integrated into the interaction graph and guide denoising with cross-graph contrastive alignment. To eliminate the impact of unreliable interactions induced by LLM hallucination, we propose a graph contrastive learning strategy that aligns representations from randomly edge-dropped views to suppress unreliable edges. Empirical results demonstrate that LLMHNI significantly improves denoising and recommendation performance.
