MMRec: Simplifying Multimodal Recommendation
Xin Zhou
TL;DR
The paper tackles the complexity and reproducibility challenges in multimodal recommendation caused by heterogeneous preprocessing and modality fusion. It proposes MMRec, an open-source, configurable toolbox that unifies data preprocessing, multimodal information fusion, model training, and evaluation into a single benchmarking framework. Key contributions include support for 10+ multimodal models across four modalities (Text, Image, Audio, Video), a unified training interface, and grid-search capable evaluation to ensure fair comparisons. The work aims to accelerate research and deployment by reducing implementation effort, with comprehensive documentation and code available at the authors' GitHub repository.
Abstract
This paper presents an open-source toolbox, MMRec for multimodal recommendation. MMRec simplifies and canonicalizes the process of implementing and comparing multimodal recommendation models. The objective of MMRec is to provide a unified and configurable arena that can minimize the effort in implementing and testing multimodal recommendation models. It enables multimodal models, ranging from traditional matrix factorization to modern graph-based algorithms, capable of fusing information from multiple modalities simultaneously. Our documentation, examples, and source code are available at \url{https://github.com/enoche/MMRec}.
