Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
Yueqi Wang, Zhenrui Yue, Huimin Zeng, Dong Wang, Julian McAuley
TL;DR
This work introduces fMRLRec, a train-once, deploy-anywhere framework for multimodal sequential recommendation that leverages full-scale Matryoshka representation learning to produce multiple model granularities from a single training pass. By embedding smaller models into larger ones through an efficient fMRLRec operator and using Linear Recurrent Units for sequence processing, the approach achieves strong performance while drastically reducing memory costs. The framework integrates language and image modalities via a simple projection, and includes an enhanced training objective that aligns multi-size models. Empirical results on four Amazon benchmarks show superior ranking performance, with notable gains on sparser datasets, and the model-series offers substantial parameter savings compared to training multiple independent models. This work provides a practical pathway to scalable, versatile recommender systems capable of adapting to diverse deployment constraints.
Abstract
Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges. This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses. In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec). Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space. Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data. Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. We make our code and data publicly available at https://github.com/yueqirex/fMRLRec.
