Table of Contents
Fetching ...

Multi-objective Large Language Model Alignment with Hierarchical Experts

Zhuo Li, Guodong Du, Weiyang Guo, Yigeng Zhou, Xiucheng Li, Wenya Wang, Fangming Liu, Yequan Wang, Deheng Ye, Min Zhang, Jing Li

TL;DR

This work tackles the challenge of aligning LLMs to diverse human objectives by introducing HoE, a hierarchical Mixture-of-Experts framework that is lightweight, parameter-efficient, and plug-and-play. HoE decomposes multi-objective alignment into single-preference subproblems, leveraging off-the-shelf single-objective models to build compact LoRA experts via task-SVD, and synthesizes multi-objective capabilities through model merging. A lightweight router expert ensemble plus a tertiary preference routing module enables dynamic, input-conditioned activation of LoRA experts, with Tchebycheff scalarization optimized via Online Mirror Descent within a PPO framework to robustly cover the Pareto frontier. Empirically, HoE achieves superior Pareto fronts across 14 objectives, 200 preferences, and 6 benchmarks, outperforming 15 baselines while reducing training and inference costs, and generalizes to unseen datasets and tasks, indicating strong practical impact for scalable, user-preference-driven LLM alignment.

Abstract

Aligning large language models (LLMs) to simultaneously satisfy multiple objectives remains a significant challenge, especially given the diverse and often conflicting nature of human preferences. Existing alignment methods struggle to balance trade-offs effectively, often requiring costly retraining or yielding suboptimal results across the Pareto frontier of preferences. In this paper, we introduce \textit{HoE}(Hierarchical Mixture-of-Experts), a \textit{lightweight}, \textit{parameter-efficient}, and \textit{plug-and-play} approach that eliminates the need for model training, while enabling LLMs to adapt across the entire Pareto frontier and accommodate diverse user preferences. In particular, \textit{HoE} consists of three hierarchical components: LoRA Experts, Router Experts and Preference Routing, reaching optimal Pareto frontiers and achieving a trade-off between parameter size, training cost, and performance. We evaluate \textit{HoE} across various tasks on 14 objectives and 200 different preferences among 6 benchmarks, demonstrating superior performance over 15 recent baselines. Code is available in the supplementary materials.

Multi-objective Large Language Model Alignment with Hierarchical Experts

TL;DR

This work tackles the challenge of aligning LLMs to diverse human objectives by introducing HoE, a hierarchical Mixture-of-Experts framework that is lightweight, parameter-efficient, and plug-and-play. HoE decomposes multi-objective alignment into single-preference subproblems, leveraging off-the-shelf single-objective models to build compact LoRA experts via task-SVD, and synthesizes multi-objective capabilities through model merging. A lightweight router expert ensemble plus a tertiary preference routing module enables dynamic, input-conditioned activation of LoRA experts, with Tchebycheff scalarization optimized via Online Mirror Descent within a PPO framework to robustly cover the Pareto frontier. Empirically, HoE achieves superior Pareto fronts across 14 objectives, 200 preferences, and 6 benchmarks, outperforming 15 baselines while reducing training and inference costs, and generalizes to unseen datasets and tasks, indicating strong practical impact for scalable, user-preference-driven LLM alignment.

Abstract

Aligning large language models (LLMs) to simultaneously satisfy multiple objectives remains a significant challenge, especially given the diverse and often conflicting nature of human preferences. Existing alignment methods struggle to balance trade-offs effectively, often requiring costly retraining or yielding suboptimal results across the Pareto frontier of preferences. In this paper, we introduce \textit{HoE}(Hierarchical Mixture-of-Experts), a \textit{lightweight}, \textit{parameter-efficient}, and \textit{plug-and-play} approach that eliminates the need for model training, while enabling LLMs to adapt across the entire Pareto frontier and accommodate diverse user preferences. In particular, \textit{HoE} consists of three hierarchical components: LoRA Experts, Router Experts and Preference Routing, reaching optimal Pareto frontiers and achieving a trade-off between parameter size, training cost, and performance. We evaluate \textit{HoE} across various tasks on 14 objectives and 200 different preferences among 6 benchmarks, demonstrating superior performance over 15 recent baselines. Code is available in the supplementary materials.

Paper Structure

This paper contains 46 sections, 2 theorems, 24 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Lemma F.1

StrongDual Let Assumption assump.5 (Policy Feasibility) hold. Then the saddle point $(\theta^{*}, \lambda^{*})$ exists such that: $\mathop{max}\limits_{\theta} \mathop{min}\limits_{\lambda} \mathbb{L}(\theta|\lambda) =\mathbb{L}(\theta^{*}|\lambda^{*}) = \mathop{min}\limits_{\lambda} \mathop{max}\li

Figures (10)

  • Figure 1: (Left) HoE decomposes the multi-objective alignment problem into a series of single-preference subproblems, each handled by a specialized expert. (Right) HoE employs hierarchical experts, integrating LoRA and router experts to approach the optimal Pareto frontier.
  • Figure 2: Illustration of our HoE approach. The left side illustrates the application scenario, where the model generates a response aligned with the prompt and given preferences. The bottom-right highlights its three hierarchical components - the LoRA experts, router experts, and a preference routing. The top-right depicts individual components, each serving as an expert for specific weightings, designed for seamless plug-and-play integration within the model.
  • Figure 3: Results of two-objective alignment on HelpAssistant, Reddit Summary and BeaverTails Task with 8 objectives. Compared to the baselines, HoE consistently achieves superior Pareto frontiers.
  • Figure 4: Comparison of alignment results with three objectives (i.e., helpful, harless and humor) on the Psoups and Helpsteer2 datasets.
  • Figure 5: Five-objective alignment results on HelpSteer. Preference weighting settings are shown in gray. The best results are bolded and second best ones are underlined.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Lemma F.1
  • Theorem F.2