Table of Contents
Fetching ...

Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems

Yonghun Jeong, David Yoon Suk Kang, Yeon-Chang Lee

Abstract

Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment based MMRSs that enforce a unified embedding space often blur modality specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality's native structure while maintaining cross modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top N recommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The codebase of AnchorRec is available at https://github.com/hun9008/AnchorRec.

Anchored Alignment: Preventing Positional Collapse in Multimodal Recommender Systems

Abstract

Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment based MMRSs that enforce a unified embedding space often blur modality specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality's native structure while maintaining cross modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top N recommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The codebase of AnchorRec is available at https://github.com/hun9008/AnchorRec.
Paper Structure (10 sections, 3 equations, 5 figures, 2 tables)

This paper contains 10 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: t-SNE maaten2008visualizing visualization of embeddings from AlignRec liu2024alignrec: the left panel shows the global distribution, and the right panel provides a zoomed-in local view revealing fine-grained local structure.
  • Figure 2: Overview of AnchorRec, highlighting the anchor-based projection and its alignment losses as the core mechanism.
  • Figure 3: Comparison of alignment strategies: (a) AnchorRec: anchor-based indirect alignment via a shared multimodal anchor ($\boldsymbol{\mathbf{p}_{\mathrm{mm}}^{i}}$); (b) BM3 zhou2023bootstrap: direct similarity-based alignment across modalities; and (c) AlignRec liu2024alignrec: direct contrastive alignment between ID and multimodal embeddings.
  • Figure 4: Neighborhood overlap across embedding spaces.
  • Figure 5: Top-3 neighbors retrieved by AlignRec and AnchorRec for the same target item: (a) quantitative comparison using 2-hop proximity and text and vision similarity, and (b) qualitative comparison using images and textual keywords.