Personalized Parameter-Efficient Fine-Tuning of Foundation Models for Multimodal Recommendation
Sunwoo Kim, Hyunjin Hwang, Kijung Shin
TL;DR
This paper tackles the challenge that item embeddings produced by multimodal foundation models are not conditioned on user interests, limiting personalization in multimodal recommendations. It introduces PerPEFT, a personalized PEFT framework that partitions users into interest-based groups and assigns a distinct PEFT module to each group, enabling group-specific emphasis on item aspects while keeping most of the foundation model fixed. A specialized training strategy using group-specific negative sampling further enhances learning of purchase-relevant, group-aligned item features. Across four real-world datasets and multiple PEFT backbones, PerPEFT consistently outperforms baselines (e.g., Global PEFT) with gains up to 15.3% in NDCG@20, while adding only about 1.3% more parameters than the backbone model. The work demonstrates that personalized, group-wise adapters can significantly improve multimodal recommendations with modest computational overhead, and it provides code and datasets for reproducibility.
Abstract
In recent years, substantial research has integrated multimodal item metadata into recommender systems, often by using pre-trained multimodal foundation models to encode such data. Since these models are not originally trained for recommendation tasks, recent works efficiently adapt them via parameter-efficient fine-tuning (PEFT). However, even with PEFT, item embeddings from multimodal foundation models remain user-blind: item embeddings are not conditioned on user interests, despite the fact that users with diverse interests attend to different item aspects. To address this limitation, we propose PerPEFT, a personalized PEFT strategy for multimodal recommendation. Specifically, PerPEFT groups users by interest and assigns a distinct PEFT module to each group, enabling each module to capture the fine-grained item aspects most predictive of that group`s purchase decisions. We further introduce a specialized training technique that strengthens this user-group conditioning. Notably, PerPEFT is PEFT-agnostic and can be paired with any PEFT method applicable to multimodal foundation models. Through extensive experiments, we show that (1) PerPEFT outperforms the strongest baseline by up to 15.3% (NDCG@20) and (2) delivers consistent gains across diverse PEFT variants. It is noteworthy that, even with personalization, PEFT remains lightweight, adding only 1.3% of the parameter count of the foundation model. We provide our code and datasets at https://github.com/kswoo97/PerPEFT.
