Table of Contents
Fetching ...

A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

Jun Yuan, Guohao Cai, Zhenhua Dong

TL;DR

This work identifies a fundamental gap in gradient-focused multi-task optimization for ranking models: the joint gradients produced by gradient-balancing methods do not reliably translate into optimal parameter updates for shared parameters when using momentum-based optimizers like Adam. It introduces Parameter Update Balancing (PUB), a novel framework that optimizes how per-task updates are combined through a convex formulation, leading to Pareto-optimal joint updates. The authors develop an efficient CCP-based solver to compute task weights, demonstrate consistent offline improvements on public CTR/CTCVR benchmarks, and report positive online impact in a real-world Huawei deployment. The results show PUB’s broad applicability across ranking and computer vision tasks and its flexibility to integrate with update manipulation methods, highlighting its practical potential for industrial-scale multi-task learning systems.

Abstract

Multi-task ranking models have become essential for modern real-world recommendation systems. While most recommendation researches focus on designing sophisticated models for specific scenarios, achieving performance improvement for multi-task ranking models across various scenarios still remains a significant challenge. Training all tasks naively can result in inconsistent learning, highlighting the need for the development of multi-task optimization (MTO) methods to tackle this challenge. Conventional methods assume that the optimal joint gradient on shared parameters leads to optimal parameter updates. However, the actual update on model parameters may deviates significantly from gradients when using momentum based optimizers such as Adam, and we design and execute statistical experiments to support the observation. In this paper, we propose a novel Parameter Update Balancing algorithm for multi-task optimization, denoted as PUB. In contrast to traditional MTO method which are based on gradient level tasks fusion or loss level tasks fusion, PUB is the first work to optimize multiple tasks through parameter update balancing. Comprehensive experiments on benchmark multi-task ranking datasets demonstrate that PUB consistently improves several multi-task backbones and achieves state-of-the-art performance. Additionally, experiments on benchmark computer vision datasets show the great potential of PUB in various multi-task learning scenarios. Furthermore, we deployed our method for an industrial evaluation on the real-world commercial platform, HUAWEI AppGallery, where PUB significantly enhances the online multi-task ranking model, efficiently managing the primary traffic of a crucial channel.

A Parameter Update Balancing Algorithm for Multi-task Ranking Models in Recommendation Systems

TL;DR

This work identifies a fundamental gap in gradient-focused multi-task optimization for ranking models: the joint gradients produced by gradient-balancing methods do not reliably translate into optimal parameter updates for shared parameters when using momentum-based optimizers like Adam. It introduces Parameter Update Balancing (PUB), a novel framework that optimizes how per-task updates are combined through a convex formulation, leading to Pareto-optimal joint updates. The authors develop an efficient CCP-based solver to compute task weights, demonstrate consistent offline improvements on public CTR/CTCVR benchmarks, and report positive online impact in a real-world Huawei deployment. The results show PUB’s broad applicability across ranking and computer vision tasks and its flexibility to integrate with update manipulation methods, highlighting its practical potential for industrial-scale multi-task learning systems.

Abstract

Multi-task ranking models have become essential for modern real-world recommendation systems. While most recommendation researches focus on designing sophisticated models for specific scenarios, achieving performance improvement for multi-task ranking models across various scenarios still remains a significant challenge. Training all tasks naively can result in inconsistent learning, highlighting the need for the development of multi-task optimization (MTO) methods to tackle this challenge. Conventional methods assume that the optimal joint gradient on shared parameters leads to optimal parameter updates. However, the actual update on model parameters may deviates significantly from gradients when using momentum based optimizers such as Adam, and we design and execute statistical experiments to support the observation. In this paper, we propose a novel Parameter Update Balancing algorithm for multi-task optimization, denoted as PUB. In contrast to traditional MTO method which are based on gradient level tasks fusion or loss level tasks fusion, PUB is the first work to optimize multiple tasks through parameter update balancing. Comprehensive experiments on benchmark multi-task ranking datasets demonstrate that PUB consistently improves several multi-task backbones and achieves state-of-the-art performance. Additionally, experiments on benchmark computer vision datasets show the great potential of PUB in various multi-task learning scenarios. Furthermore, we deployed our method for an industrial evaluation on the real-world commercial platform, HUAWEI AppGallery, where PUB significantly enhances the online multi-task ranking model, efficiently managing the primary traffic of a crucial channel.
Paper Structure (20 sections, 5 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 20 sections, 5 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Visualization of the difference between conventional GBM and PUB. We show the joint gradient obtained by GBMs (blue) and the update of parameters (red). The yellow vectors illustrate the result scaled by the factor $\alpha$, and $f$ denotes the optimizer function.
  • Figure 2: Toy Experiments. Optimization trajectories in loss space, and optimizer is Adam in all experiments. Black dots • are 5 different initializations, and their trajectories are colored from orange to purple. CAGrad, IMTL, NashMTL and FAMO converge to different Pareto-stationary points depending on initial points, while they fail to converge to the optimal solution $\mathcal{L}^{*}$ (star mark). In contrast, PUB exhibits more robust convergence behavior and can reach the optimal solution for all initializations. Please refer to navon2022nashmtl for details.
  • Figure 3: Cosine similarity between gradient and update of task-specific and shared parameters in AliExpress_ES dataset. The average cosine similarity of task-specific parameters is significant higher than shared parameters' ($p < 0.01$)
  • Figure 4: Confusion matrix of Average AUC and Diff metric. We conducted 144 experiments and got 144 pair of $x$ and $y$. A Chi-square test shows a reverse correlation of average AUC and Diff ($p < 0.001$)

Theorems & Definitions (2)

  • Claim 1
  • proof