Table of Contents
Fetching ...

M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling

Xudong Sun, Nutan Chen, Alexej Gossmann, Matteo Wohlrapp, Yu Xing, Carla Feistner, Emilio Dorigatt, Felix Drost, Daniele Scarcella, Lisa Beer, Carsten Marr

TL;DR

This work addresses the challenge of optimizing multiple loss terms in domain-generalization settings where conventional hyperparameter tuning is impractical. It introduces M-HOF-Opt, a hierarchical, output-feedback framework that jointly adapts model parameters and loss-term multipliers via a probabilistic graphical model with a hypervolume-based objective and a PI-like controller. By decomposing the problem into constrained sub-goals with shrinking reference bounds, it achieves Pareto descent without modifying the inner optimizer and without heavy memory burdens. The method demonstrates robust out-of-domain generalization across multi-term losses (e.g., DIVA on PACS) and reduces sensitivity to controller hyperparameters, offering scalable, resource-efficient multi-objective optimization for deep learning.

Abstract

A probabilistic graphical model is proposed, modeling the joint model parameter and multiplier evolution, with a hypervolume based likelihood, promoting multi-objective descent in structural risk minimization. We address multi-objective model parameter optimization via a surrogate single objective penalty loss with time-varying multipliers, equivalent to online scheduling of loss landscape. The multi-objective descent goal is dispatched hierarchically into a series of constraint optimization sub-problems with shrinking bounds according to Pareto dominance. The bound serves as setpoint for the low-level multiplier controller to schedule loss landscapes via output feedback of each loss term. Our method forms closed loop of model parameter dynamic, circumvents excessive memory requirements and extra computational burden of existing multi-objective deep learning methods, and is robust against controller hyperparameter variation, demonstrated on domain generalization tasks with multi-dimensional regularization losses.

M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling

TL;DR

This work addresses the challenge of optimizing multiple loss terms in domain-generalization settings where conventional hyperparameter tuning is impractical. It introduces M-HOF-Opt, a hierarchical, output-feedback framework that jointly adapts model parameters and loss-term multipliers via a probabilistic graphical model with a hypervolume-based objective and a PI-like controller. By decomposing the problem into constrained sub-goals with shrinking reference bounds, it achieves Pareto descent without modifying the inner optimizer and without heavy memory burdens. The method demonstrates robust out-of-domain generalization across multi-term losses (e.g., DIVA on PACS) and reduces sensitivity to controller hyperparameters, offering scalable, resource-efficient multi-objective optimization for deep learning.

Abstract

A probabilistic graphical model is proposed, modeling the joint model parameter and multiplier evolution, with a hypervolume based likelihood, promoting multi-objective descent in structural risk minimization. We address multi-objective model parameter optimization via a surrogate single objective penalty loss with time-varying multipliers, equivalent to online scheduling of loss landscape. The multi-objective descent goal is dispatched hierarchically into a series of constraint optimization sub-problems with shrinking bounds according to Pareto dominance. The bound serves as setpoint for the low-level multiplier controller to schedule loss landscapes via output feedback of each loss term. Our method forms closed loop of model parameter dynamic, circumvents excessive memory requirements and extra computational burden of existing multi-objective deep learning methods, and is robust against controller hyperparameter variation, demonstrated on domain generalization tasks with multi-dimensional regularization losses.
Paper Structure (37 sections, 8 theorems, 37 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 8 theorems, 37 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Proposition 3.1

With $b^{(k)}$ defined in eq:b_pareto_descent, suppose the following constrained optimization bertsekas2014constrained problem in eq:main_objeq:feasibility_r starting with $\theta^{(k)}$, under $s_k+m_k$ number of iterations ($s_k>0$ and $m_k\ge 0$) has a solution we achieve multi-objective descent in def:pareto-descent-operator at step $k+s_k+m_k$ compared to step $k$.

Figures (8)

  • Figure 1: Control diagram illustrating the hierarchical output feedback optimization process with multi-objective setpoint adaptation. The uncontrolled plant corresponds to an open loop dynamic $\theta^{+}=f_{\theta}(\mu, \theta, \cdot)$ defined in \ref{['def:model_para_dyn']}, via optimizing \ref{['eq:srm']} with a low level optimization algorithm lacking feedback. The $\mu$ controller adjusts the multiplier $\mu$, thus schedules the loss landscape as illustrated in \ref{['fig:landscape_scheduling']}, based on the difference $e$ between the setpoint $b$ and the measured output components $R(\cdot)$, guiding the optimization of model parameters $\theta$ through a feedback loop. The setpoint $b$ is adjusted via feedback from $\ell(\cdot)$ and $R(\cdot)$, which forms a higher hierarchy. See \ref{['fig:hyper_para_tune_bayes']} for the probabilistic description of the closed loop behavior of this control diagram.
  • Figure 2: Probabilistic graphical model for the sequential decision process of joint model parameter and multiplier adaptation in multi-domain structral risk minimization in \ref{['eq:srm']}. See \ref{['fig:control_diagram_fbopt']} for the control diagram counterpart of the same process.
  • Figure 3: Illustration of $e\mathcal{HV}$ in \ref{['def:hv']}: We take $[\ell(\theta^{(0)}, \cdot), R(\theta^{(0)})]$ as reference point. Suppose $\{A,B,C,D,E,F\}$ in the illustration constitutes function values of the reachable set $v\mathcal{R}\left(\theta^{(0)},f_{\theta}(\cdot)\right)$in \ref{['def:reachable_set']}. For any $\theta$ corresponding to the point A in the illustration with coordinate $[\ell(\theta, \cdot), R(\theta, \cdot)]$, $e\mathcal{HV}$ maps $\theta$ to $\mathcal{C}_{\ell(\cdot), R(\cdot)}\left(\theta, f_{\theta}(\cdot),\theta^{(0)}\right)$, then to $\mathcal{E}_{\ell(\cdot), R(\cdot)}\left(\theta,f_{\theta}(\cdot),\theta^{(0)}\right)$ (see \ref{['def:pareto_ec']}), which is the point set {A,B,C,D}, then it calculates the dominated hypervolume with respect to the reference point (union of the shaded rectangles).
  • Figure 4: Illustration of multiplier induced loss landscape scheduling, used as control signal in \ref{['fig:control_diagram_fbopt']}: The lifted landscape $\ell(\cdot)+\mu^{(k+1)}R(\cdot)$ (dotted curve) scheduled at iteration $k+1$, enables the model parameter dynamic to overcome the local minimum $\bar{\theta}^{(k)}_{\mu^{(k)}}$ of the old loss landscape $\ell(\cdot)+\mu^{(k)}R(\cdot)$ (solid curve) scheduled at iteration $k$, as can be imagined via the two balls with different colors rolling along the corresponding loss landscapes. This showcases a scenario corresponding to \ref{['def:reg-pareto-slider']}: In comparison to $\theta^{(k)}$, $\theta^{(k+1)}$ corresponds to a decreased $\ell(\cdot)+\mu^{(k+1)}R(\cdot)$ value but increased $\ell(\cdot)+\mu^{(k)}R(\cdot)$ value.
  • Figure 5: Our method drives different loss terms of $R(\cdot)$ at different scales and rates towards the setpoint, which further promotes the setpoint shrinkage. In the Top row, we show the multiplier dynamic as controller output signal in \ref{['eq:pid']} across training epochs. In the Bottom row, we present the tracking behavior of the corresponding regularization loss $R(\cdot)$ in \ref{['eq:loss_diva_srm']} with respect to setpoint $b$ defined in \ref{['eq:setpoint_ada_abs']}.
  • ...and 3 more figures

Theorems & Definitions (40)

  • Definition 2.1: Model parameter dynamic system
  • Definition 2.2: $R$ dominance and non-dominance
  • Definition 2.3: Reachability set and value
  • Definition 2.4: Non-dominant set map
  • Remark 3.1
  • Remark 3.2
  • Definition 3.1
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5: The dual relationship between estimation and control
  • ...and 30 more