Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

Ji Dai; Quan Fang; Dengsheng Cai

Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

Ji Dai, Quan Fang, Dengsheng Cai

TL;DR

MAGNET couples interaction-conditioned expert routing with structure-aware graph augmentation, so that both both what to fuse and how to fuse are explicitly controlled and interpretable in multimodal fusion.

Abstract

Multimodal recommendation enhances ranking by integrating user-item interactions with item content, which is particularly effective under sparse feedback and long-tail distributions. However, multimodal signals are inherently heterogeneous and can conflict in specific contexts, making effective fusion both crucial and challenging. Existing approaches often rely on shared fusion pathways, leading to entangled representations and modality imbalance. To address these issues, we propose MAGNET, a Modality-Guided Mixture of Adaptive Graph Experts Network with Progressive Entropy-Triggered Routing for Multimodal Recommendation, designed to enhance controllability, stability, and interpretability in multimodal fusion. MAGNET couples interaction-conditioned expert routing with structure-aware graph augmentation, so that both what to fuse and how to fuse are explicitly controlled and interpretable. At the representation level, a dual-view graph learning module augments the interaction graph with content-induced edges, improving coverage for sparse and long-tail items while preserving collaborative structure via parallel encoding and lightweight fusion. At the fusion level, MAGNET employs structured experts with explicit modality roles-dominant, balanced, and complementary-enabling a more interpretable and adaptive combination of behavioral, visual, and textual cues. To further stabilize sparse routing and prevent expert collapse, we introduce a two-stage entropy-weighting mechanism that monitors routing entropy. This mechanism automatically transitions training from an early coverage-oriented regime to a later specialization-oriented regime, progressively balancing expert utilization and routing confidence. Extensive experiments on public benchmarks demonstrate consistent improvements over strong baselines.

Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

TL;DR

Abstract

Paper Structure (65 sections, 34 equations, 14 figures, 7 tables, 2 algorithms)

This paper contains 65 sections, 34 equations, 14 figures, 7 tables, 2 algorithms.

Introduction
Related Work
From Collaborative Filtering to Graph-based Multimodal Recommendation
Contrastive and Self-Supervised Learning in Recommendation
Mixture of Experts (MoE) for Adaptive Fusion and Routing
Disentangled Representation Learning & Curriculum Optimization
Two-Stage Progressive Learning: From Exploration to Specialization
Methods
Problem Setup and Overview
Task Definition and Notations
Model Overview
High-order Graph Construction and Dual-view Structural Backbone
High-order Candidate Expansion.
Dual-view Structural Graphs.
Structural Encoding with Shallow Propagation.
...and 50 more sections

Figures (14)

Figure 1: Illustration of how consumers integrate multiple signals to make purchasing decisions: the dress looks visually appealing, but negative reviews mention discomfort, and her past experience with the same brand was neutral. The final choice reflects a balance of these factors.
Figure 2: Overview of our proposed MAGNET framework. The first row presents the end-to-end pipeline: (I) inputs user--item interactions and item-side visual/text features; (II) constructs content-induced edges via similarity and KNN retrieval to augment the graph; (III) performs dual-view encoding on observed and augmented views and fuses them into unified user/item representations; (IV) applies a routing-based sparse triplet MoE as the prediction head, routing each query to a sparse set of experts and aggregating their outputs under a unified training objective. The second row provides complementary details: (A) illustrates the triplet-template expert pool covering behavior/appearance/semantics patterns, and (B) shows the progressive entropy-guided routing schedule that transitions from exploration to specialization during training.
Figure 3: Hyper-parameter sensitivity of MAGNET-DV with respect to the triplet-template controls $(\alpha,\beta,\delta)$. Each subplot adopts a zoomed y-axis range to reveal subtle yet consistent performance variations. Hollow markers indicate the default setting used in all main experiments. We sweep each control over a discrete set.
Figure 4: Sensitivity of MAGNET to expert capacity $E$ and Top-$K$ routing (metric: $N@20$).Upper: family-combination under $E\le 9$ (E1--E5). Lower: expert-splitting with $E{=}9p$ (E1--E5). K1--K6 denote Top-$K$ routing with $K\in\{1,\dots,6\}$; invalid cells with $K>E$ are marked as "$\backslash$".
Figure 5: Analysis of 9-expert routing and usage patterns across domains and modalities. Left: Expert usage radar over a 9-expert pool. Middle: Modality reliance and coverage statistics. Right: Fusion regime composition and diversity metrics.
...and 9 more figures

Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

TL;DR

Abstract

Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (14)