Table of Contents
Fetching ...

DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems

Jiahao Tian, Zhenkai Wang

TL;DR

The paper tackles the challenge of effectively exploiting high-dimensional, noisy multi-modal features produced by LLMs in recommender systems. It introduces DLRREC, a unified end-to-end framework that co-trains a dimension reduction module with the ranking objective and augments latent representations with collaborative filtering signals via a multi-relational contrastive learning objective. Key contributions include the integrated co-trained dimension reduction within a DLRM backbone and the SwING-InfoNCE-based user-user and item-item contrastive losses, yielding richer and more discriminative embeddings. Empirical results on a restaurant-review dataset show substantial gains over a two-step baseline, including improved accuracy and dramatically reduced false positives, with the approach offering a practical, open-source module for broader adoption.

Abstract

Modern recommender systems struggle to effectively utilize the rich, yet high-dimensional and noisy, multi-modal features generated by Large Language Models (LLMs). Treating these features as static inputs decouples them from the core recommendation task. We address this limitation with a novel framework built on a key insight: deeply fusing multi-modal and collaborative knowledge for representation denoising. Our unified architecture introduces two primary technical innovations. First, we integrate dimensionality reduction directly into the recommendation model, enabling end-to-end co-training that makes the reduction process aware of the final ranking objective. Second, we introduce a contrastive learning objective that explicitly incorporates the collaborative filtering signal into the latent space. This synergistic process refines raw LLM embeddings, filtering noise while amplifying task-relevant signals. Extensive experiments confirm our method's superior discriminative power, proving that this integrated fusion and denoising strategy is critical for achieving state-of-the-art performance. Our work provides a foundational paradigm for effectively harnessing LLMs in recommender systems.

DLRREC: Denoising Latent Representations via Multi-Modal Knowledge Fusion in Deep Recommender Systems

TL;DR

The paper tackles the challenge of effectively exploiting high-dimensional, noisy multi-modal features produced by LLMs in recommender systems. It introduces DLRREC, a unified end-to-end framework that co-trains a dimension reduction module with the ranking objective and augments latent representations with collaborative filtering signals via a multi-relational contrastive learning objective. Key contributions include the integrated co-trained dimension reduction within a DLRM backbone and the SwING-InfoNCE-based user-user and item-item contrastive losses, yielding richer and more discriminative embeddings. Empirical results on a restaurant-review dataset show substantial gains over a two-step baseline, including improved accuracy and dramatically reduced false positives, with the approach offering a practical, open-source module for broader adoption.

Abstract

Modern recommender systems struggle to effectively utilize the rich, yet high-dimensional and noisy, multi-modal features generated by Large Language Models (LLMs). Treating these features as static inputs decouples them from the core recommendation task. We address this limitation with a novel framework built on a key insight: deeply fusing multi-modal and collaborative knowledge for representation denoising. Our unified architecture introduces two primary technical innovations. First, we integrate dimensionality reduction directly into the recommendation model, enabling end-to-end co-training that makes the reduction process aware of the final ranking objective. Second, we introduce a contrastive learning objective that explicitly incorporates the collaborative filtering signal into the latent space. This synergistic process refines raw LLM embeddings, filtering noise while amplifying task-relevant signals. Extensive experiments confirm our method's superior discriminative power, proving that this integrated fusion and denoising strategy is critical for achieving state-of-the-art performance. Our work provides a foundational paradigm for effectively harnessing LLMs in recommender systems.

Paper Structure

This paper contains 17 sections, 2 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Proposed Unified Model Architecture with Contrastive-Learning Based Dimension Reduction Module