Table of Contents
Fetching ...

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System

Yun He, Xuxing Chen, Jiayi Xu, Renqin Cai, Yiling You, Jennifer Cao, Minhui Huang, Liu Yang, Yiqun Liu, Xiaoyi Liu, Rong Jin, Sem Park, Bo Long, Xue Feng

TL;DR

This work proposes a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems and achieves significant gains with neutral training cost in Queries Per Second, which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.

Abstract

In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer between the joint learning tasks. However, multi-task learning often suffers from negative transfer: one or several tasks are less optimized than training them separately. To carefully balance the optimization, we propose a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems. It balances the per-task gradients to alleviate the negative transfer, while saving the huge cost for grid search or manual explorations for appropriate task weights. Moreover, compared with prior work that normally balance the per-task gradients of shared parameters, MultiBalance is more efficient since only requiring to access per-task gradients with respect to the shared feature representations. We conduct experiments on Meta's large-scale ads and feeds multi-task recommendation system, and observe that MultiBalance achieves significant gains (e.g., 0.738% improvement for normalized entropy (NE)) with neutral training cost in Queries Per Second (QPS), which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System

TL;DR

This work proposes a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems and achieves significant gains with neutral training cost in Queries Per Second, which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.

Abstract

In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer between the joint learning tasks. However, multi-task learning often suffers from negative transfer: one or several tasks are less optimized than training them separately. To carefully balance the optimization, we propose a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems. It balances the per-task gradients to alleviate the negative transfer, while saving the huge cost for grid search or manual explorations for appropriate task weights. Moreover, compared with prior work that normally balance the per-task gradients of shared parameters, MultiBalance is more efficient since only requiring to access per-task gradients with respect to the shared feature representations. We conduct experiments on Meta's large-scale ads and feeds multi-task recommendation system, and observe that MultiBalance achieves significant gains (e.g., 0.738% improvement for normalized entropy (NE)) with neutral training cost in Queries Per Second (QPS), which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.

Paper Structure

This paper contains 14 sections, 2 theorems, 17 equations, 3 figures, 4 tables, 3 algorithms.

Key Result

Lemma 4.1

Suppose we are given matrices $A\in \operatorname{\mathbb R}^{q\times n}, B\in \operatorname{\mathbb R}^{p\times q}$ and constants $\ell \geq \mu\geq 0$ satisfying $\mu^2 I \preceq B^\top B\preceq \ell^2 I$. Define $\lambda_{X, *} = \mathop{\mathrm{argmin}}\limits_{\lambda\in \Delta^n}\left\|X\lambd

Figures (3)

  • Figure 1: General Framework of Multi-Task Ranking Model.
  • Figure 2: The Visualization of Learning Process of MultiBalance. Green is CVR, Blue is CTR and Red is Conv|imp.
  • Figure :

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Lemma 4.1
  • Theorem 1