Table of Contents
Fetching ...

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

Yueqi Wang, Zhenrui Yue, Huimin Zeng, Dong Wang, Julian McAuley

TL;DR

This work introduces fMRLRec, a train-once, deploy-anywhere framework for multimodal sequential recommendation that leverages full-scale Matryoshka representation learning to produce multiple model granularities from a single training pass. By embedding smaller models into larger ones through an efficient fMRLRec operator and using Linear Recurrent Units for sequence processing, the approach achieves strong performance while drastically reducing memory costs. The framework integrates language and image modalities via a simple projection, and includes an enhanced training objective that aligns multi-size models. Empirical results on four Amazon benchmarks show superior ranking performance, with notable gains on sparser datasets, and the model-series offers substantial parameter savings compared to training multiple independent models. This work provides a practical pathway to scalable, versatile recommender systems capable of adapting to diverse deployment constraints.

Abstract

Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges. This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses. In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec). Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space. Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data. Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. We make our code and data publicly available at https://github.com/yueqirex/fMRLRec.

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

TL;DR

This work introduces fMRLRec, a train-once, deploy-anywhere framework for multimodal sequential recommendation that leverages full-scale Matryoshka representation learning to produce multiple model granularities from a single training pass. By embedding smaller models into larger ones through an efficient fMRLRec operator and using Linear Recurrent Units for sequence processing, the approach achieves strong performance while drastically reducing memory costs. The framework integrates language and image modalities via a simple projection, and includes an enhanced training objective that aligns multi-size models. Empirical results on four Amazon benchmarks show superior ranking performance, with notable gains on sparser datasets, and the model-series offers substantial parameter savings compared to training multiple independent models. This work provides a practical pathway to scalable, versatile recommender systems capable of adapting to diverse deployment constraints.

Abstract

Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges. This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses. In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec). Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space. Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data. Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. We make our code and data publicly available at https://github.com/yueqirex/fMRLRec.
Paper Structure (30 sections, 18 equations, 4 figures, 3 tables)

This paper contains 30 sections, 18 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: fMRLRec-based weight design, white cells indicate zeros and arrows show vector-matrix multiplication. Input slice $[0:m]$ is only relevant to weight matrix slice $[0:m,0:km]$ during training, convenient for variously-sized model weights extraction during inference time.
  • Figure 2: The overall architecture for fMRLRec.
  • Figure 3: fMRLRec-model series performance curve against model size. fMRLRec features a significantly slower performance drop for example with drop rates from 6.14% to 37.69% (Recall@10 for Clothing) compared to the model compression rate of 50%.
  • Figure 4: fMRL features a one-time training of model sizes $\mathcal{M}=\{2,4,\ldots,2^n\}$ that saves $\approx33\%$ parameters compared to training every size independently.