OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Xiangyu Zhao; Shengyuan Ding; Zicheng Zhang; Haian Huang; Maosong Cao; Weiyun Wang; Jiaqi Wang; Xinyu Fang; Wenhai Wang; Guangtao Zhai; Haodong Duan; Hua Yang; Kai Chen

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Xiangyu Zhao, Shengyuan Ding, Zicheng Zhang, Haian Huang, Maosong Cao, Weiyun Wang, Jiaqi Wang, Xinyu Fang, Wenhai Wang, Guangtao Zhai, Haodong Duan, Hua Yang, Kai Chen

TL;DR

This work tackles the gap in human preference alignment for open-source multi-modal LLMs by introducing OmniAlign-V, a ~200K open-ended multi-modal SFT dataset generated through a tailored data synthesis pipeline, and MM-AlignBench, a high-quality, human-annotated benchmark for evaluating alignment with human values. The authors show that finetuning with OmniAlign-V via supervised fine-tuning (SFT) or direct preference optimization (DPO) significantly improves alignment with human preferences while preserving or enhancing standard VQA capabilities. A key insight is that high-quality multi-modal data, rather than solely better language data, is crucial for improving multi-modal alignment, as evidenced by ablation and benchmark results. The work provides extensive releases (data, benchmark, code, checkpoints) and highlights the need for specialized multi-modal alignment data to realize practical, human-aligned MLLMs in real-world interactions.

Abstract

Recent advancements in open-source multi-modal large language models (MLLMs) have primarily focused on enhancing foundational capabilities, leaving a significant gap in human preference alignment. This paper introduces OmniAlign-V, a comprehensive dataset of 200K high-quality training samples featuring diverse images, complex questions, and varied response formats to improve MLLMs' alignment with human preferences. We also present MM-AlignBench, a human-annotated benchmark specifically designed to evaluate MLLMs' alignment with human values. Experimental results show that finetuning MLLMs with OmniAlign-V, using Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO), significantly enhances human preference alignment while maintaining or enhancing performance on standard VQA benchmarks, preserving their fundamental capabilities. Our datasets, benchmark, code and checkpoints have been released at https://github.com/PhoenixZ810/OmniAlign-V.

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

TL;DR

Abstract

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)