Table of Contents
Fetching ...

Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Zhipeng Zhou, Ziqiao Meng, Pengcheng Wu, Peilin Zhao, Chunyan Miao

TL;DR

This work tackles optimization conflicts and task imbalance in multi-task learning by introducing COST, a practical continual optimization framework based on symmetry teleportation. COST uses a low-rank adapter (LoRA) to teleport the shared backbone to a loss-invariant, higher-gradient point at confictful moments, while enforcing loss invariance through a convex-like drift term and guiding gradient advancement via a sharpness-based objective. A historical trajectory reuse strategy preserves optimizer momentum across teleports, enabling continued benefit from advanced optimizers such as Adam. Empirically, COST achieves state-of-the-art or competitive results across diverse MT benchmarks, and its plug-and-play nature demonstrates broad compatibility with existing MTL methods.

Abstract

Multi-task learning (MTL) is a widely explored paradigm that enables the simultaneous learning of multiple tasks using a single model. Despite numerous solutions, the key issues of optimization conflict and task imbalance remain under-addressed, limiting performance. Unlike existing optimization-based approaches that typically reweight task losses or gradients to mitigate conflicts or promote progress, we propose a novel approach based on Continual Optimization with Symmetry Teleportation (COST). During MTL optimization, when an optimization conflict arises, we seek an alternative loss-equivalent point on the loss landscape to reduce conflict. Specifically, we utilize a low-rank adapter (LoRA) to facilitate this practical teleportation by designing convergent, loss-invariant objectives. Additionally, we introduce a historical trajectory reuse strategy to continually leverage the benefits of advanced optimizers. Extensive experiments on multiple mainstream datasets demonstrate the effectiveness of our approach. COST is a plug-and-play solution that enhances a wide range of existing MTL methods. When integrated with state-of-the-art methods, COST achieves superior performance.

Continual Optimization with Symmetry Teleportation for Multi-Task Learning

TL;DR

This work tackles optimization conflicts and task imbalance in multi-task learning by introducing COST, a practical continual optimization framework based on symmetry teleportation. COST uses a low-rank adapter (LoRA) to teleport the shared backbone to a loss-invariant, higher-gradient point at confictful moments, while enforcing loss invariance through a convex-like drift term and guiding gradient advancement via a sharpness-based objective. A historical trajectory reuse strategy preserves optimizer momentum across teleports, enabling continued benefit from advanced optimizers such as Adam. Empirically, COST achieves state-of-the-art or competitive results across diverse MT benchmarks, and its plug-and-play nature demonstrates broad compatibility with existing MTL methods.

Abstract

Multi-task learning (MTL) is a widely explored paradigm that enables the simultaneous learning of multiple tasks using a single model. Despite numerous solutions, the key issues of optimization conflict and task imbalance remain under-addressed, limiting performance. Unlike existing optimization-based approaches that typically reweight task losses or gradients to mitigate conflicts or promote progress, we propose a novel approach based on Continual Optimization with Symmetry Teleportation (COST). During MTL optimization, when an optimization conflict arises, we seek an alternative loss-equivalent point on the loss landscape to reduce conflict. Specifically, we utilize a low-rank adapter (LoRA) to facilitate this practical teleportation by designing convergent, loss-invariant objectives. Additionally, we introduce a historical trajectory reuse strategy to continually leverage the benefits of advanced optimizers. Extensive experiments on multiple mainstream datasets demonstrate the effectiveness of our approach. COST is a plug-and-play solution that enhances a wide range of existing MTL methods. When integrated with state-of-the-art methods, COST achieves superior performance.

Paper Structure

This paper contains 19 sections, 1 theorem, 12 equations, 8 figures, 5 tables.

Key Result

Theorem 1

Assume task loss functions $\mathcal{L}_1, ..., \mathcal{L}_{K}$ are differentiable and $\Lambda$-smooth ($\Lambda$$>$0) such that $\left \| \nabla\mathcal{L}_i(\bm{\theta_1}) - \nabla\mathcal{L}_i(\bm{\theta_2}) \right \| \le \Lambda \left \| \bm{\theta_1} - \bm{\theta_2} \right \|$ for any two poi

Figures (8)

  • Figure 1: The illustration of symmetry teleportation. (a) is the original gradient descent. (b) is the gradient descent with a faster convergence rate after teleporting the start point from (a).
  • Figure 2: Illustration of conflict and imbalance issues in MTL. 'Bal' and 'Imb' represent balanced and imbalanced, while 'N-Con' and 'Con' represent non-conflicting and conflicting.
  • Figure 3: Dominated conflict vs. loss examination. The pink backdrop designates the conflicting area, whereas the green backdrop indicates the non-conflicting area. The blue scatter points are the individual recorded points throughout the optimization process. The red dashed line symbolizes the teleportation occurring from a conflict point to a non-conflict point. An exponential amplification has been applied to the loss values to enhance visual clarity.
  • Figure 4: The Illustration of COST. Here, we depict a one-time teleportation procedure by using a 2-task example for the sake of illustration. It is worth noting that LoRA is only applied to the shared backbone.
  • Figure 5: (a) Conflict ratio per epoch on CelebA (40-task) and NYUv2 (3-task) and (b) loss examinations during a single teleportation.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1: Gradient Similarity
  • Theorem 1