Table of Contents
Fetching ...

Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima

Shanshan Zhong, Zhongzhan Huang, Daifeng Li, Wushao Wen, Jinghui Qin, Liang Lin

TL;DR

This paper analyzes multimodal recommender systems from the novel perspective of flat local minima and proposes a concise yet effective gradient strategy called Mirror Gradient (MG), which can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs.

Abstract

Multimodal recommender systems utilize various types of information to model user preferences and item features, helping users discover items aligned with their interests. The integration of multimodal information mitigates the inherent challenges in recommender systems, e.g., the data sparsity problem and cold-start issues. However, it simultaneously magnifies certain risks from multimodal information inputs, such as information adjustment risk and inherent noise risk. These risks pose crucial challenges to the robustness of recommendation models. In this paper, we analyze multimodal recommender systems from the novel perspective of flat local minima and propose a concise yet effective gradient strategy called Mirror Gradient (MG). This strategy can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs. We also provide strong theoretical evidence and conduct extensive empirical experiments to show the superiority of MG across various multimodal recommendation models and benchmarks. Furthermore, we find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models, making it a promising new and fundamental paradigm for training multimodal recommender systems. The code is released at https://github.com/Qrange-group/Mirror-Gradient.

Mirror Gradient: Towards Robust Multimodal Recommender Systems via Exploring Flat Local Minima

TL;DR

This paper analyzes multimodal recommender systems from the novel perspective of flat local minima and proposes a concise yet effective gradient strategy called Mirror Gradient (MG), which can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs.

Abstract

Multimodal recommender systems utilize various types of information to model user preferences and item features, helping users discover items aligned with their interests. The integration of multimodal information mitigates the inherent challenges in recommender systems, e.g., the data sparsity problem and cold-start issues. However, it simultaneously magnifies certain risks from multimodal information inputs, such as information adjustment risk and inherent noise risk. These risks pose crucial challenges to the robustness of recommendation models. In this paper, we analyze multimodal recommender systems from the novel perspective of flat local minima and propose a concise yet effective gradient strategy called Mirror Gradient (MG). This strategy can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs. We also provide strong theoretical evidence and conduct extensive empirical experiments to show the superiority of MG across various multimodal recommendation models and benchmarks. Furthermore, we find that the proposed MG can complement existing robust training methods and be easily extended to diverse advanced recommendation models, making it a promising new and fundamental paradigm for training multimodal recommender systems. The code is released at https://github.com/Qrange-group/Mirror-Gradient.
Paper Structure (14 sections, 2 theorems, 5 equations, 5 figures, 12 tables, 1 algorithm)

This paper contains 14 sections, 2 theorems, 5 equations, 5 figures, 12 tables, 1 algorithm.

Key Result

lemma 1

dherin2021geometrichuang2021rethinking Consider a neural network $f(x)$ with $L$ layers and learnable parameters $\theta$. $h_i, 1\leq i \leq L$, denotes the feature map from $i$ th layer. For any scalar function $g$ of $h_L$, we have

Figures (5)

  • Figure 1: An illustrative example of multimodal risks. Merchants add popular tags (e.g., "ins style") and broad keywords (e.g., "suit") to the text of the bodysuit to increase the likelihood of the item being recommended. At the same time, merchants dynamically change the item's visual features in real-time due to Women's Day marketing campaigns and the emphasis on the superiority of the item's material. These actions make it difficult for the recommender system to accurately determine the target user for the current item, leading to incorrect recommendations for young girls.
  • Figure 2: Illustration of flat local minima. When the distribution of the inputs shifts, for example, facing the risks of inherent noise and information adjustment, the loss landscape $\ell_o$ of the recommender system also shifts (to $\ell_s$) accordingly. The parameters $\theta_a$ located in flat local minima are more robust compared to $\theta_b$ in sharp local minima.
  • Figure 3: Visualization of local minima. Training loss landscapes of FREEDOM and BM3 on Baby trained with or without MG.
  • Figure 4: Convergence of MG on the dataset Baby.
  • Figure 5: Recall for different $(\alpha_1, \alpha_2)$ on Baby.

Theorems & Definitions (2)

  • lemma 1
  • theorem 1