Table of Contents
Fetching ...

PSMGD: Periodic Stochastic Multi-Gradient Descent for Fast Multi-Objective Optimization

Mingjing Xu, Peizhong Ju, Jia Liu, Haibo Yang

TL;DR

PSMGD addresses the heavy per-iteration cost of gradient-manipulation in multi-objective optimization by periodically recomputing objective weights and reusing them, leveraging the observed stability of dynamic weights. The method achieves state-of-the-art convergence rates for strongly convex, convex, and non-convex objectives and introduces Backpropagation (BP) complexity to quantify computational workload, showing potential objective-independent BP when the recomputation interval scales with the number of objectives. Theoretical guarantees are complemented by extensive experiments on QM-9 and NYU-v2, where PSMGD delivers comparable or superior performance with significantly faster training times than existing MOO methods. This combination of theoretical efficiency and empirical competitiveness suggests PSMGD as a practical, scalable approach for fast multi-objective optimization in deep learning and related tasks.

Abstract

Multi-objective optimization (MOO) lies at the core of many machine learning (ML) applications that involve multiple, potentially conflicting objectives (e.g., multi-task learning, multi-objective reinforcement learning, among many others). Despite the long history of MOO, recent years have witnessed a surge in interest within the ML community in the development of gradient manipulation algorithms for MOO, thanks to the availability of gradient information in many ML problems. However, existing gradient manipulation methods for MOO often suffer from long training times, primarily due to the need for computing dynamic weights by solving an additional optimization problem to determine a common descent direction that can decrease all objectives simultaneously. To address this challenge, we propose a new and efficient algorithm called Periodic Stochastic Multi-Gradient Descent (PSMGD) to accelerate MOO. PSMGD is motivated by the key observation that dynamic weights across objectives exhibit small changes under minor updates over short intervals during the optimization process. Consequently, our PSMGD algorithm is designed to periodically compute these dynamic weights and utilizes them repeatedly, thereby effectively reducing the computational overload. Theoretically, we prove that PSMGD can achieve state-of-the-art convergence rates for strongly-convex, general convex, and non-convex functions. Additionally, we introduce a new computational complexity measure, termed backpropagation complexity, and demonstrate that PSMGD could achieve an objective-independent backpropagation complexity. Through extensive experiments, we verify that PSMGD can provide comparable or superior performance to state-of-the-art MOO algorithms while significantly reducing training time.

PSMGD: Periodic Stochastic Multi-Gradient Descent for Fast Multi-Objective Optimization

TL;DR

PSMGD addresses the heavy per-iteration cost of gradient-manipulation in multi-objective optimization by periodically recomputing objective weights and reusing them, leveraging the observed stability of dynamic weights. The method achieves state-of-the-art convergence rates for strongly convex, convex, and non-convex objectives and introduces Backpropagation (BP) complexity to quantify computational workload, showing potential objective-independent BP when the recomputation interval scales with the number of objectives. Theoretical guarantees are complemented by extensive experiments on QM-9 and NYU-v2, where PSMGD delivers comparable or superior performance with significantly faster training times than existing MOO methods. This combination of theoretical efficiency and empirical competitiveness suggests PSMGD as a practical, scalable approach for fast multi-objective optimization in deep learning and related tasks.

Abstract

Multi-objective optimization (MOO) lies at the core of many machine learning (ML) applications that involve multiple, potentially conflicting objectives (e.g., multi-task learning, multi-objective reinforcement learning, among many others). Despite the long history of MOO, recent years have witnessed a surge in interest within the ML community in the development of gradient manipulation algorithms for MOO, thanks to the availability of gradient information in many ML problems. However, existing gradient manipulation methods for MOO often suffer from long training times, primarily due to the need for computing dynamic weights by solving an additional optimization problem to determine a common descent direction that can decrease all objectives simultaneously. To address this challenge, we propose a new and efficient algorithm called Periodic Stochastic Multi-Gradient Descent (PSMGD) to accelerate MOO. PSMGD is motivated by the key observation that dynamic weights across objectives exhibit small changes under minor updates over short intervals during the optimization process. Consequently, our PSMGD algorithm is designed to periodically compute these dynamic weights and utilizes them repeatedly, thereby effectively reducing the computational overload. Theoretically, we prove that PSMGD can achieve state-of-the-art convergence rates for strongly-convex, general convex, and non-convex functions. Additionally, we introduce a new computational complexity measure, termed backpropagation complexity, and demonstrate that PSMGD could achieve an objective-independent backpropagation complexity. Through extensive experiments, we verify that PSMGD can provide comparable or superior performance to state-of-the-art MOO algorithms while significantly reducing training time.

Paper Structure

This paper contains 27 sections, 8 theorems, 41 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3.5

Under Assumptions assum:LSmooth- assum:BG, when each objective is bounded by $F$ ($f_s({\mathbf{x}}) \leq F, s \in [S]$), the sequence of iterates generated by the PSMGD Algorithm in non-convex functions satisfies: Setting $1 - \alpha_t = \min \{\frac{\eta_t}{\eta_1}, \frac{\eta_t}{\eta_1 \sqrt{t} \max_{s \in [S]} | \hat{\lambda}_{t}^s - \lambda_{t-R}^s |} \}, \eta_t = \mathcal{O}(\frac{1}{\sqrt{

Figures (7)

  • Figure 1: The convergence behaviors on a pedagogical example with 10 runs for each algorithm: (a) Solutions obtained from random linear scalarization. (b) Solutions obtained from the MGDA method. (c) Solutions obtained from the PSMGD method proposed in this paper, with weights updated every 4 iterations ($R=4$) through the training. (d) We visualize weights changing curves over iterations in 5 runs. This method successfully generates a set of widely distributed Pareto solutions with different trade-offs. Details of the pedagogical example can be found in Section \ref{['sec: syn']}.
  • Figure 2: Test loss in training (300 epochs) on QM-9.
  • Figure 3: Test loss of segmentation, depth, and surface normal in training (200 epochs) on NYU-v2.
  • Figure 4: PSMGD achieves a favorable balance between test accuracy and training time of epoch and iteration. (a) (Left) Average test accuracy: mean and 95% CI (10 runs). (b) (Middle) Box plots for the average training time of an epoch (10 runs). (c) (Right) Box plots for the average training time of an iteration (10 epochs).
  • Figure 5: PSMGD achieves a favorable balance between each task's accuracy and convergence complexity time of epoch in MultiMNIST. (a) (Left) The results for MultiMNIST with Task1&2 accuracy. (b) (Middle) Average backpropagation complexity for convergence: mean and 95% CI (10 runs). (c) (Right) Box plots for convergence time (10 runs).
  • ...and 2 more figures

Theorems & Definitions (18)

  • Definition 1: (Weak) Pareto Optimality
  • Definition 2: Pareto Stationarity
  • Theorem 3.5: Non-Convex Functions
  • Theorem 3.6: General Convex Functions
  • Theorem 3.7: $\mu$-Strongly Convex Functions
  • Remark 3.8
  • Definition 3: Backpropogation complexity
  • Remark 3.9
  • Remark 3.10
  • Lemma A.1: Lemma 2 and Lemma 7 in zhou2022convergence
  • ...and 8 more