Table of Contents
Fetching ...

Cross-domain Recommender Systems via Multimodal Domain Adaptation

Adamya Shyam, Ramya Kamani, Venkateswara Rao Kagita, Vikas Kumar

TL;DR

The paper tackles data sparsity in collaborative filtering by proposing a cross-domain recommender system that uses multimodal information to align embeddings across domains. It introduces two domain adaptation variants, VDAR and FDAR, that fuse textual and visual features with latent interaction factors and train a domain classifier to align source and target embeddings in a semi-supervised setting. Extensive experiments on four Amazon domains show that FDAR consistently outperforms state-of-the-art baselines in both single-domain and cross-domain scenarios, with statistically significant improvements. This work demonstrates the value of combining multimodal representations and domain-adaptive alignment for robust recommendations in data-sparse environments and has potential to mitigate cold-start issues.

Abstract

Collaborative Filtering (CF) has emerged as one of the most prominent implementation strategies for building recommender systems. The key idea is to exploit the usage patterns of individuals to generate personalized recommendations. CF techniques, especially for newly launched platforms, often face a critical issue known as the data sparsity problem, which greatly limits their performance. Cross-domain CF alleviates the problem of data sparsity by finding a common set of entities (users or items) across the domains, which then act as a conduit for knowledge transfer. Nevertheless, most real-world datasets are collected from different domains, so they often lack information about anchor points or reference information for entity alignment. This paper introduces a domain adaptation technique to align the embeddings of entities across domains. Our approach first exploits the available textual and visual information to independently learn a multi-view latent representation for each entity in the auxiliary and target domains. The different representations of the entity are then fused to generate the corresponding unified representation. A domain classifier is then trained to learn the embedding for the domain alignment by fixing the unified features as the anchor points. Experiments on \AS{four} publicly available benchmark datasets indicate the effectiveness of our proposed approach.

Cross-domain Recommender Systems via Multimodal Domain Adaptation

TL;DR

The paper tackles data sparsity in collaborative filtering by proposing a cross-domain recommender system that uses multimodal information to align embeddings across domains. It introduces two domain adaptation variants, VDAR and FDAR, that fuse textual and visual features with latent interaction factors and train a domain classifier to align source and target embeddings in a semi-supervised setting. Extensive experiments on four Amazon domains show that FDAR consistently outperforms state-of-the-art baselines in both single-domain and cross-domain scenarios, with statistically significant improvements. This work demonstrates the value of combining multimodal representations and domain-adaptive alignment for robust recommendations in data-sparse environments and has potential to mitigate cold-start issues.

Abstract

Collaborative Filtering (CF) has emerged as one of the most prominent implementation strategies for building recommender systems. The key idea is to exploit the usage patterns of individuals to generate personalized recommendations. CF techniques, especially for newly launched platforms, often face a critical issue known as the data sparsity problem, which greatly limits their performance. Cross-domain CF alleviates the problem of data sparsity by finding a common set of entities (users or items) across the domains, which then act as a conduit for knowledge transfer. Nevertheless, most real-world datasets are collected from different domains, so they often lack information about anchor points or reference information for entity alignment. This paper introduces a domain adaptation technique to align the embeddings of entities across domains. Our approach first exploits the available textual and visual information to independently learn a multi-view latent representation for each entity in the auxiliary and target domains. The different representations of the entity are then fused to generate the corresponding unified representation. A domain classifier is then trained to learn the embedding for the domain alignment by fixing the unified features as the anchor points. Experiments on \AS{four} publicly available benchmark datasets indicate the effectiveness of our proposed approach.
Paper Structure (15 sections, 17 equations, 6 figures, 8 tables)

This paper contains 15 sections, 17 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Latent factor alignment of user and item in the source and target domain. The distributions of both entities vary in different domains. The textual and visual features are extracted by the same feature extractor models and thus mapped to the same spaces. These extracted features are leveraged in the alignment of embeddings of both domains in the latent space.
  • Figure 2: Autoencoder for encoding the images in a lower dimension. The size of an input image is (64, 64, 3), and its corresponding encoded representation is a 300-dimension vector. Different colors indicate different layers in the model. The visual representation for items is taken from the middle layer, which gives an encoded representation.
  • Figure 3: Outline of the proposed approach.
  • Figure 4: Effect of learning rate ($\eta$) and regularization coefficient ($\lambda$) on TCF and FCF over Cell Phones and Accessories dataset.
  • Figure 5: Critical Difference (CD) diagrams for the comparing algorithms in SDR and CDR scenarios.
  • ...and 1 more figures