ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin
TL;DR
ParetoHqD tackles offline multiobjective alignment of large language models by recasting user preferences as directions in reward space and treating data near the Pareto front as high-quality. It employs a two-stage SFT pipeline guided by Pareto high-quality data, with data augmentation to mitigate overfitting and to span concave and convex regions of the front. Across two diverse tasks, ParetoHqD achieves superior Pareto fronts and higher hypervolume than five baselines, while substantially reducing language collapse and maintaining favorable computational efficiency. The work advances practical, personalized, and scalable multiobjective alignment for LLMs by addressing preference representation, data distribution, and training efficiency.
Abstract
Aligning large language models with multiple human expectations and values is crucial for ensuring that they adequately serve a variety of user needs. To this end, offline multiobjective alignment algorithms such as the Rewards-in-Context algorithm have shown strong performance and efficiency. However, inappropriate preference representations and training with imbalanced reward scores limit the performance of such algorithms. In this work, we introduce ParetoHqD that addresses the above issues by representing human preferences as preference directions in the objective space and regarding data near the Pareto front as "high-quality" data. For each preference, ParetoHqD follows a two-stage supervised fine-tuning process, where each stage uses an individual Pareto high-quality training set that best matches its preference direction. The experimental results have demonstrated the superiority of ParetoHqD over five baselines on two multiobjective alignment tasks.
