Table of Contents
Fetching ...

Efficient Multi-Task Learning via Generalist Recommender

Luyang Wang, Cangcheng Tang, Chongyang Zhang, Jun Ruan, Kai Huang, Jason Dai

TL;DR

This work tackles scalability challenges in multi-task recommender systems by introducing Generalist Recommender (GRec), a multi-task, multi-modal architecture that uses sparse mixture-of-experts (MoE) and a novel task-sentence routing mechanism to scale task capacity without degrading inference speed. GRec combines a wide-and-deep front end, a Parallel Transformer, and a Task-Sentence MoE layer to route inputs to task-aware experts, enabling cross-task generalization while maintaining efficiency through routing strategies such as $top-k$. The authors demonstrate substantial offline and online gains on real-world data: offline improvements on AliExpress and internal transaction datasets, plus significant online A/B test improvements across multiple use cases, including CVR-related metrics. The results support the practical viability of deploying GRec in large-scale production recommender systems, highlighting the value of task-aware routing and multi-modal processing for multi-task learning in industry contexts.

Abstract

Multi-task learning (MTL) is a common machine learning technique that allows the model to share information across different tasks and improve the accuracy of recommendations for all of them. Many existing MTL implementations suffer from scalability issues as the training and inference performance can degrade with the increasing number of tasks, which can limit production use case scenarios for MTL-based recommender systems. Inspired by the recent advances of large language models, we developed an end-to-end efficient and scalable Generalist Recommender (GRec). GRec takes comprehensive data signals by utilizing NLP heads, parallel Transformers, as well as a wide and deep structure to process multi-modal inputs. These inputs are then combined and fed through a newly proposed task-sentence level routing mechanism to scale the model capabilities on multiple tasks without compromising performance. Offline evaluations and online experiments show that GRec significantly outperforms our previous recommender solutions. GRec has been successfully deployed on one of the largest telecom websites and apps, effectively managing high volumes of online traffic every day.

Efficient Multi-Task Learning via Generalist Recommender

TL;DR

This work tackles scalability challenges in multi-task recommender systems by introducing Generalist Recommender (GRec), a multi-task, multi-modal architecture that uses sparse mixture-of-experts (MoE) and a novel task-sentence routing mechanism to scale task capacity without degrading inference speed. GRec combines a wide-and-deep front end, a Parallel Transformer, and a Task-Sentence MoE layer to route inputs to task-aware experts, enabling cross-task generalization while maintaining efficiency through routing strategies such as . The authors demonstrate substantial offline and online gains on real-world data: offline improvements on AliExpress and internal transaction datasets, plus significant online A/B test improvements across multiple use cases, including CVR-related metrics. The results support the practical viability of deploying GRec in large-scale production recommender systems, highlighting the value of task-aware routing and multi-modal processing for multi-task learning in industry contexts.

Abstract

Multi-task learning (MTL) is a common machine learning technique that allows the model to share information across different tasks and improve the accuracy of recommendations for all of them. Many existing MTL implementations suffer from scalability issues as the training and inference performance can degrade with the increasing number of tasks, which can limit production use case scenarios for MTL-based recommender systems. Inspired by the recent advances of large language models, we developed an end-to-end efficient and scalable Generalist Recommender (GRec). GRec takes comprehensive data signals by utilizing NLP heads, parallel Transformers, as well as a wide and deep structure to process multi-modal inputs. These inputs are then combined and fed through a newly proposed task-sentence level routing mechanism to scale the model capabilities on multiple tasks without compromising performance. Offline evaluations and online experiments show that GRec significantly outperforms our previous recommender solutions. GRec has been successfully deployed on one of the largest telecom websites and apps, effectively managing high volumes of online traffic every day.

Paper Structure

This paper contains 11 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The framework of our GRec model.
  • Figure 2: The architecture of wide and deep layer in GRec.
  • Figure 3: Differences between token-level, sentence-level, task-level and task-sentence-level sparse MoEs.
  • Figure 4: The trend of AUC changes when the number of experts selected and the total number of experts are different.