xMTF: A Formula-Free Model for Reinforcement-Learning-Based Multi-Task Fusion in Recommender Systems
Yang Cao, Changhao Zhang, Xiaoshuang Chen, Kaiqiao Zhan, Ben Wang
TL;DR
The paper tackles the problem of integrating multiple user feedback signals in recommender systems by moving beyond predefined fusion formulas. It introduces xMTF, a formula-free framework that uses Monotonic Fusion Cells (MFCs) guided by the Sprecher Representation to express any monotone fusion function, enabling personalized, expressive fusion without fixed formulas. To train this larger, more flexible space, the authors propose a two-stage hybrid (TSH) method: an RL-based outer stage with few parameters controlling the fusion and a supervised inner stage with many parameters that learns from the outer stage through knowledge transfer and ranking-based losses. Empirical results on offline KuaiRand data and an online platform with over 100 million users show that xMTF outperforms formula-based and existing RL-based MTF methods, improving long-term user satisfaction as measured by Total Watch Time and related engagement metrics, and it has been deployed in production. Overall, the work expands the MTF search space with interpretable monotonic transformations, offering practical gains in long-term recommender performance.
Abstract
Recommender systems need to optimize various types of user feedback, e.g., clicks, likes, and shares. A typical recommender system handling multiple types of feedback has two components: a multi-task learning (MTL) module, predicting feedback such as click-through rate and like rate; and a multi-task fusion (MTF) module, integrating these predictions into a single score for item ranking. MTF is essential for ensuring user satisfaction, as it directly influences recommendation outcomes. Recently, reinforcement learning (RL) has been applied to MTF tasks to improve long-term user satisfaction. However, existing RL-based MTF methods are formula-based methods, which only adjust limited coefficients within pre-defined formulas. The pre-defined formulas restrict the RL search space and become a bottleneck for MTF. To overcome this, we propose a formula-free MTF framework. We demonstrate that any suitable fusion function can be expressed as a composition of single-variable monotonic functions, as per the Sprecher Representation Theorem. Leveraging this, we introduce a novel learnable monotonic fusion cell (MFC) to replace pre-defined formulas. We call this new MFC-based model eXtreme MTF (xMTF). Furthermore, we employ a two-stage hybrid (TSH) learning strategy to train xMTF effectively. By expanding the MTF search space, xMTF outperforms existing methods in extensive offline and online experiments.
