Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

Xiaoyu Kong; Jiancan Wu; An Zhang; Leheng Sheng; Hui Lin; Xiang Wang; Xiangnan He

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He

TL;DR

The paper centers on mitigating negative transfer in LLM-based sequential recommendation by moving from a uniform LoRA fine-tuning approach to Instance-wise LoRA (iLoRA). iLoRA integrates a Mixture of Experts with LoRA, routing instance-specific sequence representations through a gating network to customize LoRA updates for each user sequence, while keeping parameter counts comparable to standard LoRA. Empirical results on LastFM, MovieLens, and Steam show iLoRA achieving state-of-the-art Hit Ratio, with up to 11.4% relative improvement and robust ablations demonstrating the value of sequence-guided gating and optimal expert count (4). This approach advances personalized recommendations by explicitly modeling per-sequence variability, reducing negative transfer, and maintaining computational efficiency for scalable deployment.

Abstract

Sequential recommendation systems predict the next interaction item based on users' past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches are eager to apply LLMs to sequential recommendation. A common paradigm is converting user behavior sequences into instruction data, and fine-tuning the LLM with parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaption (LoRA). However, the uniform application of LoRA across diverse user behaviors is insufficient to capture individual variability, resulting in negative transfer between disparate sequences. To address these challenges, we propose Instance-wise LoRA (iLoRA). We innovatively treat the sequential recommendation task as a form of multi-task learning, integrating LoRA with the Mixture of Experts (MoE) framework. This approach encourages different experts to capture various aspects of user behavior. Additionally, we introduce a sequence representation guided gate function that generates customized expert participation weights for each user sequence, which allows dynamic parameter adjustment for instance-wise recommendations. In sequential recommendation, iLoRA achieves an average relative improvement of 11.4\% over basic LoRA in the hit ratio metric, with less than a 1\% relative increase in trainable parameters. Extensive experiments on three benchmark datasets demonstrate the effectiveness of iLoRA, highlighting its superior performance compared to existing methods in mitigating negative transfer and improving recommendation accuracy. Our data and code are available at https://github.com/AkaliKong/iLoRA.

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

TL;DR

Abstract

Paper Structure (24 sections, 12 equations, 5 figures, 1 table)

This paper contains 24 sections, 12 equations, 5 figures, 1 table.

Introduction
Preliminary
Methodology
Instance-wise Generation for Sequential Recommendation
Instance-wise LoRA with the Mixture of Experts Concept
Splitting Low-Rank Matrices into Experts
Generating Instance-wise Attentions over Experts
Aggregating Mixture of Experts as Instance-wise LoRA
Experiments
Investing Rationale of Instance-wise LoRA (RQ1)
Negative Transfer in Uniform LoRA & Instance-wise LoRA
Expert Showcase in Instance-wise LoRA
Performance Comparison (RQ2)
Ablation Study (RQ3)
Effects of Gating Network
...and 9 more sections

Figures (5)

Figure 1: Gradient similarity of LoRA modules across training steps. The sequence dataset is partitioned into 8 clusters using Euclidean distance, with hierarchical clustering applied to reorder clusters, so that clusters closer in the collaborative space are also closer together in the heatmap. Gradient similarity is used to assess the geometric characteristics of the loss, with darker cells indicating higher similarity. In the case study on the right, dashed lines connect similar items, while solid lines link identical items. Users with a gradient similarity of 0.86 share a strong interest in thriller movies, while those with -0.75 cosine similarity show no clear preference alignment.
Figure 2: The iLoRA framework, which integrates the idea of MoE with LoRA, to implement sequence-customized activation patterns for various sequences.
Figure 3: \ref{['fig:gradient-lora']} and \ref{['fig:gradient-ilora']} separately show gradient similarities of LLaRA and iLoRA, with sequences partitioned into 8 clusters; \ref{['fig:attention-scores']} exhibits the attention scores over four experts, for ten sequences.
Figure 4: \ref{['fig:e_num']} illustrates the performance of iLoRA w.r.t. HitRatio@1 across different datasets with varying numbers of experts. \ref{['fig:5epoch']} further demonstrates the HitRatio@1 performance of the model across different epochs during training on the Steam dataset with varying numbers of experts.
Figure 5: Effects of iLoRA's components

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

TL;DR

Abstract

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)