Efficient Multi-task Prompt Tuning for Recommendation

Ting Bai; Le Huang; Yue Yu; Cheng Yang; Cheng Hou; Zhe Zhao; Chuan Shi

Efficient Multi-task Prompt Tuning for Recommendation

Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, Chuan Shi

TL;DR

A novel two-stage prompt-tuning MTL framework (MPT-Rec) is proposed to address task irrelevance and training efficiency problems in multi-task recommender systems and achieves the best performance compared to the SOTA multi-task learning method on three real-world datasets.

Abstract

With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact existing tasks in most multi-task learning methods. Besides, such a re-training mechanism with new tasks increases the training costs, limiting the generalization ability of multi-task recommendation models. Based on this consideration, we aim to design a suitable sharing mechanism among different tasks while maintaining joint optimization efficiency in new task learning. A novel two-stage prompt-tuning MTL framework (MPT-Rec) is proposed to address task irrelevance and training efficiency problems in multi-task recommender systems. Specifically, we disentangle the task-specific and task-sharing information in the multi-task pre-training stage, then use task-aware prompts to transfer knowledge from other tasks to the new task effectively. By freezing parameters in the pre-training tasks, MPT-Rec solves the negative impacts that may be brought by the new task and greatly reduces the training costs. Extensive experiments on three real-world datasets show the effectiveness of our proposed multi-task learning framework. MPT-Rec achieves the best performance compared to the SOTA multi-task learning method. Besides, it maintains comparable model performance but vastly improves the training efficiency (i.e., with up to 10% parameters in the full training way) in the new task learning.

Efficient Multi-task Prompt Tuning for Recommendation

TL;DR

Abstract

Paper Structure (28 sections, 15 equations, 6 figures, 4 tables)

This paper contains 28 sections, 15 equations, 6 figures, 4 tables.

Introduction
related work
Multi-Task Learning
Multi-Task Generalization
Multi-Task Fine-tuning
Methodology
The General Framework
Multi-Task Pre-training
Learning Disentangled Information
Learning Fusion Information
Multi-Task Prompt-tuning
Task-Specific Information Transfer
Task-Aware Prompt Tuning
Efficiency Analysis
EXPERIMENTS
...and 13 more sections

Figures (6)

Figure 1: The experimental results of multi-task learning method MMOE on Census-income dataset. Task T1, T2 and T3 are the predictions of the "income", "marital status", and "sex" labels. We can see that compared with the AUC performance on the single task, MTL on two tasks promotes each other. Learning with the new task T3, MTL improves the performance on new task T3, but damages T1 and T2 which had been optimized in the two-tasks stage.
Figure 2: The overall architecture of our proposed MTL framework MPT-Rec. It consists of two components: the multi-task pre-training component and the multi-task prompt-tuning component. In the pre-training component, a generative adversarial network is designed to disentangle the task-specific and task-sharing information by using the task-sharing expert as a generator and the task classifier as the discriminator. In the prompt-tuning component, the parameters in the pre-training model are frozen, and useful knowledge is transferred to the new task by a prompt mechanism. Whether parameters need to be trained is indicated by labeling them as "ice" or "fire".
Figure 3: Fine-tuning operations on Shared Bottom, MMOE and PLE methods. "fire" means the parameters need to be trained, and "ice" indicates the freeze status.
Figure 4: Performance comparison of variant models on new task prediction on Census-income and Ali-CCP datasets. "Share" and "Specific" denote only task-shared and task-specific information is used. "-GAN" refers that does not disentangle the task-specific and task-sharing information by removing the GAN part.
Figure 5: Performance comparison of different methods to combine the specific information from other tasks. "FW" uses fixed weights to fuse the task-specific knowledge in the pre-training phase. "TES" calculates the fusion weights according to task embedding similarity.
...and 1 more figures

Efficient Multi-task Prompt Tuning for Recommendation

TL;DR

Abstract

Efficient Multi-task Prompt Tuning for Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)