Table of Contents
Fetching ...

Training-free Adjustable Polynomial Graph Filtering for Ultra-fast Multimodal Recommendation

Yu-Seung Roh, Joo-Young Kim, Jin-Duk Park, Won-Yong Shin

Abstract

Multimodal recommender systems improve the performance of canonical recommender systems with no item features by utilizing diverse content types such as text, images, and videos, while alleviating inherent sparsity of user-item interactions and accelerating user engagement. However, current neural network-based models often incur significant computational overhead due to the complex training process required to learn and integrate information from multiple modalities. To address this challenge, we propose a training-free multimodal recommendation method grounded in graph filtering, designed for multimodal recommendation systems to achieve efficient and accurate recommendation. Specifically, the proposed method first constructs multiple similarity graphs for two distinct modalities as well as user-item interaction data. Then, it optimally fuses these multimodal signals using a polynomial graph filter that allows for precise control of the frequency response by adjusting frequency bounds. Furthermore, the filter coefficients are treated as hyperparameters, enabling flexible and data-driven adaptation. Extensive experiments on real-world benchmark datasets demonstrate that the proposed method not only improves recommendation accuracy by up to 22.25% compared to the best competitor but also dramatically reduces computational costs by achieving the runtime of less than 10 seconds.

Training-free Adjustable Polynomial Graph Filtering for Ultra-fast Multimodal Recommendation

Abstract

Multimodal recommender systems improve the performance of canonical recommender systems with no item features by utilizing diverse content types such as text, images, and videos, while alleviating inherent sparsity of user-item interactions and accelerating user engagement. However, current neural network-based models often incur significant computational overhead due to the complex training process required to learn and integrate information from multiple modalities. To address this challenge, we propose a training-free multimodal recommendation method grounded in graph filtering, designed for multimodal recommendation systems to achieve efficient and accurate recommendation. Specifically, the proposed method first constructs multiple similarity graphs for two distinct modalities as well as user-item interaction data. Then, it optimally fuses these multimodal signals using a polynomial graph filter that allows for precise control of the frequency response by adjusting frequency bounds. Furthermore, the filter coefficients are treated as hyperparameters, enabling flexible and data-driven adaptation. Extensive experiments on real-world benchmark datasets demonstrate that the proposed method not only improves recommendation accuracy by up to 22.25% compared to the best competitor but also dramatically reduces computational costs by achieving the runtime of less than 10 seconds.

Paper Structure

This paper contains 28 sections, 1 theorem, 16 equations, 9 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.1

For any admissible range of eigenvalues, the matrix polynomial can be expressed as: where $\bar{P}^f_\star$ is the graph filter associated with the similarity graph $\bar{P}_\star$, having the frequency response function of $h(\lambda)=\sum_{k=1}^K\frac{a_k}{(\lambda^\star)^{k-1}} (\lambda^\star-\lambda)^k$, and $\lambda^\star = \lambda^\star_{max} - \lambda^\star_{min}$. Here, $\l

Figures (9)

  • Figure 1: Training time comparison of two GCN-based MRSs under different degrees of modality information on the Baby dataset.
  • Figure 2: Histogram of eigenvalues for item--item similarity graphs constructed from user--item interactions and multimodal information on the Baby dataset. The left panels show the case of $\alpha=0.5$ and $s=1$, corresponding to symmetric normalization, while the right panels show the case of $\alpha=0.7$ and $s=0.6$, representing asymmetric normalization with an additional adjustment process. Red dashed lines indicate the minimum eigenvalue of each graph, while blue dashed lines indicate the maximum eigenvalue.
  • Figure 3: The schematic overview of MM-GF.
  • Figure 4: The impact of outliers and negative values (leading to singularities).
  • Figure 5: Log-scaled runtime comparison of MM-GF (with GPU and CPU) and MGCN (GPU) using various scaled synthetic datasets.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem 4.1