Table of Contents
Fetching ...

No More Tuning: Prioritized Multi-Task Learning with Lagrangian Differential Multiplier Methods

Zhengxing Cheng, Yuheng Huang, Zhixuan Zhang, Dan Ou, Qingwen Liu

TL;DR

This paper tackles the challenge of prioritizing tasks in multi-task learning without the burden of hyperparameter tuning. It introduces No More Tuning (NMT), a Lagrangian differential multiplier framework that enforces high-priority task performance as inequality constraints while sequentially optimizing lower-priority tasks, all within gradient-descent compatible workflows. Theoretical analysis establishes strong duality under reasonable assumptions and demonstrates convergence, with a re-scaling mechanism to stabilize training. Empirically, NMT improves high-priority metrics on public MTL datasets across multiple architectures and yields substantial gains in an industrial Taobao search system, while preserving or enhancing lower-priority objectives, illustrating broad applicability and practical impact.

Abstract

Given the ubiquity of multi-task in practical systems, Multi-Task Learning (MTL) has found widespread application across diverse domains. In real-world scenarios, these tasks often have different priorities. For instance, In web search, relevance is often prioritized over other metrics, such as click-through rates or user engagement. Existing frameworks pay insufficient attention to the prioritization among different tasks, which typically adjust task-specific loss function weights to differentiate task priorities. However, this approach encounters challenges as the number of tasks grows, leading to exponential increases in hyper-parameter tuning complexity. Furthermore, the simultaneous optimization of multiple objectives can negatively impact the performance of high-priority tasks due to interference from lower-priority tasks. In this paper, we introduce a novel multi-task learning framework employing Lagrangian Differential Multiplier Methods for step-wise multi-task optimization. It is designed to boost the performance of high-priority tasks without interference from other tasks. Its primary advantage lies in its ability to automatically optimize multiple objectives without requiring balancing hyper-parameters for different tasks, thereby eliminating the need for manual tuning. Additionally, we provide theoretical analysis demonstrating that our method ensures optimization guarantees, enhancing the reliability of the process. We demonstrate its effectiveness through experiments on multiple public datasets and its application in Taobao search, a large-scale industrial search ranking system, resulting in significant improvements across various business metrics.

No More Tuning: Prioritized Multi-Task Learning with Lagrangian Differential Multiplier Methods

TL;DR

This paper tackles the challenge of prioritizing tasks in multi-task learning without the burden of hyperparameter tuning. It introduces No More Tuning (NMT), a Lagrangian differential multiplier framework that enforces high-priority task performance as inequality constraints while sequentially optimizing lower-priority tasks, all within gradient-descent compatible workflows. Theoretical analysis establishes strong duality under reasonable assumptions and demonstrates convergence, with a re-scaling mechanism to stabilize training. Empirically, NMT improves high-priority metrics on public MTL datasets across multiple architectures and yields substantial gains in an industrial Taobao search system, while preserving or enhancing lower-priority objectives, illustrating broad applicability and practical impact.

Abstract

Given the ubiquity of multi-task in practical systems, Multi-Task Learning (MTL) has found widespread application across diverse domains. In real-world scenarios, these tasks often have different priorities. For instance, In web search, relevance is often prioritized over other metrics, such as click-through rates or user engagement. Existing frameworks pay insufficient attention to the prioritization among different tasks, which typically adjust task-specific loss function weights to differentiate task priorities. However, this approach encounters challenges as the number of tasks grows, leading to exponential increases in hyper-parameter tuning complexity. Furthermore, the simultaneous optimization of multiple objectives can negatively impact the performance of high-priority tasks due to interference from lower-priority tasks. In this paper, we introduce a novel multi-task learning framework employing Lagrangian Differential Multiplier Methods for step-wise multi-task optimization. It is designed to boost the performance of high-priority tasks without interference from other tasks. Its primary advantage lies in its ability to automatically optimize multiple objectives without requiring balancing hyper-parameters for different tasks, thereby eliminating the need for manual tuning. Additionally, we provide theoretical analysis demonstrating that our method ensures optimization guarantees, enhancing the reliability of the process. We demonstrate its effectiveness through experiments on multiple public datasets and its application in Taobao search, a large-scale industrial search ranking system, resulting in significant improvements across various business metrics.

Paper Structure

This paper contains 26 sections, 3 theorems, 22 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Slater's condition holds for CO

Figures (3)

  • Figure 1: Optimization trajectories for two strategies. The traditional approach compromises between tasks, often sub-optimally affecting the primary objective. The NMT framework first optimizes the primary task, then refines secondary tasks while maintaining the primary task's performance.
  • Figure 2: AUC performance comparison for different model configurations across Like and Finish tasks. The colored lines are the performance of different models under different adjusted weights in loss function, and the star markers with the same color are the performance of respective NMT optimized model. The NMT-enhanced models demonstrate a significant improvement over their standard counterparts.
  • Figure 3: Training metrics of secondary task relevance of pay (High priority) + relevance (Low priority). The top line shows the $\lambda$ during training for the pay + relevance tasks. The middle line illustrates the fluctuation of the pay task loss around its optimal value. The bottom line displays the loss function of the secondary task relevance during the training process.

Theorems & Definitions (5)

  • Proposition 1
  • proof
  • Proposition 2
  • Theorem 1
  • proof