Table of Contents
Fetching ...

A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks

Minh-Duc Nguyen, Dung D. Le

TL;DR

This work tackles learning the complete Pareto front in multi-objective learning while preserving solution diversity. It introduces SVH-MOL, which combines Stein Variational Gradient Descent with a hypernetwork to generate and iteratively refine a population of Pareto-optimal solutions, enabling dense front coverage. An annealing schedule is proposed to balance exploration and convergence, with three scalarization strategies guiding the driving force. Empirical results on synthetic benchmarks and real-world multi-task problems show improved hypervolume and front coverage, demonstrating robust, scalable performance across diverse objective landscapes.

Abstract

Pareto Set Learning (PSL) is popular as an efficient approach to obtaining the complete optimal solution in Multi-objective Learning (MOL). A set of optimal solutions approximates the Pareto set, and its mapping is a set of dense points in the Pareto front in objective space. However, some current methods face a challenge: how to make the Pareto solution is diverse while maximizing the hypervolume value. In this paper, we propose a novel method to address this challenge, which employs Stein Variational Gradient Descent (SVGD) to approximate the entire Pareto set. SVGD pushes a set of particles towards the Pareto set by applying a form of functional gradient descent, which helps to converge and diversify optimal solutions. Additionally, we employ diverse gradient direction strategies to thoroughly investigate a unified framework for SVGD in multi-objective optimization and adapt this framework with an annealing schedule to promote stability. We introduce our method, SVH-MOL, and validate its effectiveness through extensive experiments on multi-objective problems and multi-task learning, demonstrating its superior performance.

A Framework for Controllable Multi-objective Learning with Annealed Stein Variational Hypernetworks

TL;DR

This work tackles learning the complete Pareto front in multi-objective learning while preserving solution diversity. It introduces SVH-MOL, which combines Stein Variational Gradient Descent with a hypernetwork to generate and iteratively refine a population of Pareto-optimal solutions, enabling dense front coverage. An annealing schedule is proposed to balance exploration and convergence, with three scalarization strategies guiding the driving force. Empirical results on synthetic benchmarks and real-world multi-task problems show improved hypervolume and front coverage, demonstrating robust, scalable performance across diverse objective landscapes.

Abstract

Pareto Set Learning (PSL) is popular as an efficient approach to obtaining the complete optimal solution in Multi-objective Learning (MOL). A set of optimal solutions approximates the Pareto set, and its mapping is a set of dense points in the Pareto front in objective space. However, some current methods face a challenge: how to make the Pareto solution is diverse while maximizing the hypervolume value. In this paper, we propose a novel method to address this challenge, which employs Stein Variational Gradient Descent (SVGD) to approximate the entire Pareto set. SVGD pushes a set of particles towards the Pareto set by applying a form of functional gradient descent, which helps to converge and diversify optimal solutions. Additionally, we employ diverse gradient direction strategies to thoroughly investigate a unified framework for SVGD in multi-objective optimization and adapt this framework with an annealing schedule to promote stability. We introduce our method, SVH-MOL, and validate its effectiveness through extensive experiments on multi-objective problems and multi-task learning, demonstrating its superior performance.

Paper Structure

This paper contains 25 sections, 17 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: The Pareto Set Learning (PSL) framework. (Left) In a general multi-objective problem, the hypernetwork $h(r_i, \phi)$ maps a preference vector $r_i$ to a solution $x_i$. (Right) In multi-task learning, the hypernetwork generates the parameters $\theta_i$ for a target network $TN(\cdot, \theta_i)$ that processes data for multiple tasks.
  • Figure 2: Illustration of the trade-off controlled by the diversity hyperparameter $\alpha$ (see Eq. 11). A large $\alpha$ (red dots) prioritizes the repulsive force, leading to a diverse but suboptimal Pareto set (poor convergence). A small $\alpha$ (black dots) results in a well-converged front but with potentially less spread.
  • Figure 3: Visualization of Pareto fronts on the three-objective RE37 benchmark. The plots compare baseline PHN methods (top row) with our A-SVH-MOL methods (bottom row) using Linear (LS), Tchebyshev (TCH), and Smooth Tchebyshev (STCH) scalarization. Our A-SVH variants achieve superior coverage and diversity.
  • Figure 4: Generated Pareto fronts on multi-task image classification benchmarks: (Left) Multi-MNIST, (Center) Multi-Fashion, and (Right) Multi-MNIST+Fashion. Our A-SVH-MOL method (black line) is compared against several PHN baselines. In all three datasets, A-SVH-MOL achieves a superior Pareto front, representing a better trade-off between the two task losses.
  • Figure 5: Ablation study on the initial annealing period ($T_0$) on multi-task learning. The plots compare the Hypervolume (HV) performance of (Left) a standard cyclical annealing schedule against (Right) our proposed annealing method (Eq. 12). Our method shows stable and improving performance with a larger $T_0$, while the cyclical method becomes unstable and degrades.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: Dominance
  • Definition 2: Pareto Optimal Solution
  • Definition 3: Weakly Pareto Optimal Solution
  • Definition 4: Pareto Set/Pareto Front